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1.1  SCOPE  OF  STUDY 

1.2  ARRANGEMENT  OF  REPORT 

1.3  ACKNOWLEDGMENTS 


Many  state-of-the-art  studies  have  been  criticised 
because  of  their  narrow  definition  of  the  subject  covered.  In  an 
attempt  to  avoid  this,  coordinate  Indexing  In  this  study  has  been 
defined  to  Include  all  systems  In  which  the  logical  operation  of  Inter¬ 
section,  union  and  negation  are  brought  into  play  in  the  manipulation 
of  Index  terms.  This  therefore  covers  nearly  every  mechanized  system. 
It  Includes  both  term-on- 1  tern  and  I tem-on-term  systems.  It  excludes 
only  those  systems  which  are  precoordinated,  |.e,  nonman  I  pul  at  I ve. 

Among  the  many  names  given  to  coordinate  Indexing  are  correlative 
indexing,  multiple  aspect  Indexing,  concept  coordination,  and  "Enriched 
Co-ordinate  Indexing." 

Just  as  the  definition  of  coordinate  Indexing  was  made  as  broad  as 
possible,  so  also  were  subsidiary  definitions.  For  example,  whether 
systems  use  Unlterms,  keywords,  descriptors,  unit  concepts,  or 
structerms,  they  are  Included  In  this  study.  A  distinction  Is  made 
between  pure  and  constrained  systems,  but  not  so  as  to  eliminate  the 
latter  from  the  study. 

The  same  broadness  In  definition  Is  allowed  In  the  study  on  the 
various  constraints.  For  example,  In  the  chapter  on  roles  and  links, 
although  there  Is  an  attempt  to  distinguish  among  the  various  kinds  of 
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roles  and  links,  none  is  omitted  because  it  does  not  fit  «  particular 
definition. 

In  any  state-of-the-art  study,  there  Is  a  danger  of  not  having 
given  full  credit  and  not  having  done  justice  to  all  existing  work,  to 
say  nothing  of  the  Inevitability  of  accidental  exclusion.  Doubtlessly 
this  study  will  suffer  from  all  of  these  defects.  It  Is  particularly 
regretted  that  the  study  Is  limited  to  the  Untted  States. 

1.2  Arrangement  of  Report 

There  are  unavoidable  overlaps  among  the  various  sections  of 
this  report.  To  maintain  completeness  within  each  section,  a  high 
degree  of  redundancy  has  purposely  been  Introduced  and  material  from 
other  sections  Is  either  summarized,  repeated,  or  referenced. 

Each  section  has  Its  own  reference-footnotes.  However,  at  the  end 
of  the  report  Is  an  annotated  bibliography  of  the  more  significant 
papers  that  were  published  between  19**7  and  i960. 
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2.  THE  LITERATURE  OF  COORDINATE  INDEXING 

j 

The  materials  listed  In  this  survey  are  selected  from  a  much 
larger  literature  which  has  developed  In  this  field  In  the  past  fifteen 
years.  The  reader  will  rote  certain  omissions  and  will  have  a  right  to 
Inquire  concerning  the  basis  of  selection. 

The  primary  basis  for  selection  was  whether  or  not  the  document 
contributed  to  the  state  of  the  art  of  coordinate  Indexing. 

For  example,  this  Is  not  a  study  of  "machine  literature  searching". 
There  are  many  papers  on  the  use  of  machines  which  make  no  contribution 
to  the  logic  of  machine  search.  Because  coordinate  Indexes  are 
manipulative  and  are  predicated  on  logical  operations,  such  papers  are 
not  Included  In  the  bibliography  (although  many  have  been  read  during 
the  course  of  this  study). 

Since  1947  (the  earliest  paper  In  the  bibliography)  the  number  of 
workers  In  the  field  and  the  number  of  papers  have  grown  enormously. 

From  an  odd  and  esoteric  concern  of  a  few  people,  the  field  of  I.  R.  has 
grown  until  It  has  even  been  noticed  by  Fortune,  which  says: 

"A  number  of  I.  R.  machines  have  already  been  devised  and 
built.  There  are  also  In  use  systems  that  have  adapted 
for  I.  R.  purposes  conventional  punched-card  machines  and 
computers.  I.B.H.  estimates  that  Industry  Is  now  spending 
about  $2  million  a  year  on  systems  of  the  latter  sort.  By 
1965,  according  to  I.B.M.,  the  expenditure  on  I.  R.  will 
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Jump  to  over  $100  million  a  yaar,  end  thereafter  double 
every  three  years."  * 

Any  selection  process  Involves  a  value  judgment  <nnd  the  reader, 

If  he  wishes,  can  insist  that  such  evaluations  are  subjective,  private, 
and  Tack  objective  validity.  There  are,  however,  certain  Internal 
criteria  which  a  good  bibliography  can  use,  especially  In  a  field  as 
new  and  active  as  this  one.  As  the  field  has  grown,  It  has  been 
notorious  for  the  controversies  It  has  engendered  and  for  the  large 
number  of  self-proclaimed  "experts"  It  contains.  One  basis  of  object¬ 
ively  evaluating  the  literature  in  a  field  and  the  experts  who  produce 
It  Is  by  ascertaining  how  many  of  the  experts  read  the  works  of  the 
other  experts.  In  other  words,  If  bibliography  Is  significant,  It  must 
be  assumed  that  in  general,  a  man  who  writes  about  any  subject  without 
any  knowledge  of  or  reference  to  other  writings  on  the  subject  Is  not 
likely  to  be  making  a  significant  contribution  to  It.  There  are 
always  exceptions  to  such  a  conclusion.  Any  field  has  its  small 
percentage  of  meteoric  personalities  who  may  enter  the  field  glowing 
like  a  comet.  Out  of  their  own  Internal  genius  or  knowledge  they  may 
provide  new  Insights  which  are  Important  and  effective.  But  since 
knowledge  In  general,  and  science  In  particular,  Is  cumulative, 

*  Fortune.  September  I960,  pp.  162-192. 
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In  most  cases  It  Is  reasonably  safe  to  disregard  the  "contribution" 
which  Is  made  either  with  complete  unawareness  or  complete  unconcern 
for  the  literature  In  the  field, 

In  another  context,  Etienne  61  Ison  described  certain  Innovators 
as  being  "In  that  blessed  state  of  Ignorance  which  makes  It  easy  for 
a  clever  man  to  be  original,"  It  Is  the  cream  of  the  Jest  that  In  this 
field,  basically  concerned  with  the  value  of  Information  end  the  means 
of  retrieving  It  so  that  science  In  general  can  be  an  orderly, 
cumulative  process,  there  Is  a  high  percentage  of  papers  which  appear 
without  footnotes  and  without  reference  to  the  works  of  others.  The 
elimination  of  many  such  papers  from  this  bibliography  has  a  scientific 
and  objective  basts. 
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3.  COORDINATE  INDEXING  AND  CLASSIFICATION  THEORY  -  THE  DEVELOPMENT 
OF  COORDINATE  INDEXING 

3.'  GfiOerfil 

i 

In  this  historical  account  of  coordinate  Indexing,  as  in  accounts 

of  similar  developments,  the  most  difficult  task  Is  arriving  at  some 

1 

statement  of  "how  It  all  began."  Most  Intellectual  movements  do  not 
begin  j£b  gyo  but  usually  have  roots  which  can  be  traced  as  far  back  as 
anyone  has  energy  to  trace  them.  But  the  account  must  begin  somewhere, 
and  It  will  be  helpful  In  determining  this  place  If  there  Is  first  set 
forth  a  definition  of  coordinate  Indexing  which  will  distinguish  It 
from  other  forms,  namely,  alphabetical  indexing  and  systematic  classi¬ 
fication,  which  go  back  almost  to  the  beginning  of  Intellectual  history. 
Even  If  coordinate  Indexing  Is  considered  to  be  only  a  variant  of  these 
two  older  forms,  It  remains  necessary  to  trace  the  beginnings  of  this 
variant,  if  only  to  determine  how  It  has  developed  from  these  be¬ 
ginnings.  Hence,  the  first  task  Is  to  define  coordinate  Indexing  In  a 
manner  which  distinguishes  It  quite  sharply  from  alphabetical  Indexing, 
on  one  hand,  and  systematic  hierarchical  classification  on  the  other. 

3-2  Class  Manipulation 

One  of  the  essential  characteristics  of  coordinate  Indexing  is 
that  It  Is  manipulative,  as  opposed  to  fixed.  For  example,  In  a  system 
of  alphabetical  subject  headings  or  In  standard  alphabetical  Indexing, 
each  heading  Is  fixed  and  complete.  Any  Item  In  such  systems  may  be 
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Indexed  by  a  plurality  of  heading#;  but  It  Is  not  necessary  to  "match1', 
"coordinate",  or  "Intersect"  headings  In  a  search  of  the  system.  (The 
logic  of  such  Intersection  or  matching  will  be  discussed  below.) 

Quite  early  in  the  development  of  coordinatv  Indexing  it  was 
pointed  out  that  subject  headings,  like  "liver  -  radiation  Injuries" 
and  "gamma  rays  -  pathological  effects" J  were  each  Intended  to  be 
complete  and  to  provide  a  fixed  and  complete  mode  of  access  to  the 
Item  Indexed.  A  system  using  such  headings  Is  essentially  based  on  the 
assumption  that  the  headings  can  be  as  complete  as  required  and  that  It 
Is  never  necessary  to  Intersect  headings  or  to  use  two  or  more  headings 
to  retrieve  any  Indexed  item.  Similarly,  with  reference  to  classifi¬ 
cation  systems,  with  or  without  notations,  any  subclass  Is  defined  by 
all  the  classes  of  which  It  Is  a  subclass;  and  classification  systems 
do  not  require,  Tn  any  instance,  that  a  search  be  made  by  looking  for 
Identical  members  of  different  subclasses.  On  the  other  hand,  It  Is 
the  essence  of  coordinate  indexing  that  a  search  be  made  In  a  system  of 
coordinate  indexing  by  performing  operations  on  the  classes  defined  by 
the  headings  in  the  system.  It  Is  because  of  these  operations  that  the 
word  "manipulative"  to  characterize  systems  of  coordinate  Indexing  Is 
apt  and  descriptive.  (The  credit  for  this  term  should  be  given  to 
Charles  Bernier,  formerly  of  Chemical  Abstracts  and  now  of  ASTIA.)2 


It  is  now  fairly  generally  accepted  that  many  different  types  of 
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machines  can  be  used  to  perform  the  manipulations  required  In  systems 
of  coordinate  Indexing.  On  the  other  hand,  In  terms  of  the  question  of 
origins,  does  the  use  of  machinery  In  coordinate  Indexing  Imply  that 
coordinate  Indexing  Is  traceable  back  to  the  first  recorded  descriptions 
of  the  use  of  punched  cards  or  edge-notched  cards  for  indexing  col¬ 
lections  of  data  or  Information?  It  may  be  thought  that  the  answer  to 
this  question  does  not  matter  too  much.  One  could  say  that  coordinate 
Indexing  began  when  manipulative  Indexing  began,  that  is,  when  any 
device  was  used,  such  as  a  sorting  machine  or  a  set  of  needles,  to 
select  Items  from  a  file.  However,  this  conclusion  would  hide  the  fact 
that  the  meaning  of"manlpulatlve"ls  wider  than  the  meaning  of  "coordinate 
indexing".  What  Is  manipulated  In  a  system  of  coordinate  Indexing  are 
classes  In  order  to  deliver  certain  Intersections  (or  other  logical 
functions)  of  classes.  It  Is  possible  to  manipulate  a  file  for  quite 
other  reasons.  For  example,  one  might  use  a  sorting  machine  to  organize 
a  file  Into  an  alphabetical  array;  or  one  might  use  a  set  of  needles  to 
select  from  a  file  all  Items  Indexed  under  any  particular  class  desig¬ 
nation.  In  other  words,  although  coordinate  indexing  Implies  the  use  of 
some  type  of  mechanism  for  manipulating  classes,  the  use  of  a  manipu¬ 
lative  mechanism  does  not  Imply  that  a  system  using  such  a  mechanism 
Is  a  system  of  coordinate  Indexing. 
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3.3  Class  intersection 

In  many  of  the  early  descriptions  of  edge-notched  card  systems 
and  punched  card  systems  the  advantages  claimed  for  such  devices  were: 

(1)  They  could  eliminate  the  requirement  of 
maintaining  the  flies  In  any  organized  array. 

(2)  They  could  eliminate  the  necessity  of  multi¬ 
ple  cards  for  items  Indexed  under  multiple 
headings. 

If  systems  are  used  In  connection  with  machines  for  these  purposes  and 
these  purposes  alone,  they  cannot  be  considered  systems  of  coordinate 
indexing,  For  example,  Mooers'  careful  delineation  of  descriptors  and 
his  development  of  superimposed  coding  for  multiple  Indexing  In  re¬ 
stricted  coding  fields,  while  Important  contributions  to  the  art  of 
Information  retrieval,  do  not  by  themselves  make  Zatocoding  Into  a 
system  of  coordinate  Indexing,  Of  course,  if  a  Zatocoding  system  Is 
designed  and  set  up  for  the  purpose  of  making  searches  on  the  logical 
functions  of  descriptors,  then  such  a  system  would  be  a  system  of 
coordinate  indexing. 

One  of  the  earliest  papers  to  recognize  that  mechanical  devices 
could  be  used  to  intersect  classes  was  a  paper  presented  by  Dr.  J.  E. 
Holmstrom  at  the  Berne  Conference  of  the  International  Federation  for 
Documentation  In  1947  and  reprinted  as  Paper  No.  33  In  The  Royal 
Society  Scientific  Conference.^  In  this  paper  Holmstrom  specifically 
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distinguishes  mechanical  Indexing  from  classification  and  alphabetical 

Indexing  In  the  following  terms: 

"Punched  card  systems  have  been  allotted  a  separate 
place  In  our  diagram  as  they  differ  fundamentally 
from  any  of  the  above.  Requiring  no  regular  filing 
sequence  at  all,  nor  any  cross  references  or  multiple 
entries,  they  offer,  In  the  following  respect,  an 
outstanding  advantage  over  any  other  system:  the 
possibility  of  sifting  out  from  an  accumulated  mass 
of  records,  at  any, time,  any  desired  con  I unctions 
of  Information..."1* 

Here  we  see  the  recognition  that  punched  card  equipment  can  eliminate 
the  necessity  for  ordered  files  and  the  necessity  for  multiple  cards; 
but  there  Is  also  the  recognition  that  such  devices  can  be  used  for  the 
"conjunction"  of  headings.  Holmstrom,  In  this  paper,  also  recognizes 
that  search  by  conjunction  of  headings  was  also  recommended  by  Batten 
In  a  1947  paper,  "A  Punched  Card  System  of  Indexing  to  Meet  Special 
Requirements. , 

Although  It  Is  clear  that  Holmstrom  recognized  that  machines  could 
be  used  for  a  single  search  on  a  number  of  headings,  he  did  not  empha¬ 
size  this  point  because  he  believed  that  the  relationships  among  terms 
were  more  Important  than  the  terms  themselves.  Thus,  he  Indicated  that 
punched  cards  could  be  used  for  handling  material  classified  according 
to  UOC  numbers.  Such  numbers  would  preserve  the  relationships  and  the 
relationship  of  classes  to  one  another  and  still  make  It  possible  to 
Index  an  Item  under  a  plurality  of  different  class  headings  on  one  card. 
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3,4  Boolean  Function! 

•At  the  Third  Plenary  Session  of  the  Royal  Society  Conference, 

Or.  Holmstrom,  In  his  general  remarks,  referred  to  the  Berne  paper  and 
reiterated  the  possibility  of  searching  for  an  item  by  the  two  terms, 
"man"  and  "dog",  but  he  goes  on  to  say  that  searching  for  "roan"  and 
"dog"  Is  not  as  effective  as  searching  for  "dog  bites  man"  and  he 
emphasizes  that 

"...  It  Is  precisely  these  relations  between  the 
concepts ,  not  the  concepts  themselves,  which  we 
want  to  be  able  to  select  from  the  total  mass  of 
literature.  Unless  we  have  some  means  of  doing 
that  there  is  perhaps  some  danger  of  falling  Into 
the  misconception  of  Mark  Twain's  over-sanguine 
young  Journalist  (I  think  the  story  Is  Mark  Twain's) 
who  undertook  to  produce  at  short  notice  an  article 
on  Chinese  philosophy  which  happened  to  be  urgently 
required,  and  when  asked  how  he  proposed  to  do  It 
said  the  task  was  quite  simple:  he  would  look  up 
'China'  in  the  encyclopaedia  and  he  would  look  up 
'philosophy'  tn  the  encyclopaedia,  and  then.he 
would  have  merely  to  combine  his  material 

(The  story  is,  in  fact,  Dickens'.) 

It  Is  fair  to  say  that  by  thus  emphasizing  relations  which  may  or 
may  not  be  Important  In  the  Indexing  problem,  Holmstrom  failed  to 
recognize  the  logic  underlying  the  pure  form  of  coordinate  indexing. 
Calvin  Mooers  has  called  to  our  attention  the  fact  that  the  early 
Hollerith  patents  of  1884  and  1889  and  the  Taylor  patent  of  1950 
exhibited  a  recognition  of  the  application  of  the  logic  of  classes  to 
search  and  selection  by  multiple  terms,  although  they  did  not  use  the 
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logic  of  classes  explicitly  In  describing  the  operation  of  their  devices. 
The  earliest  explicit  recognition  of  the  logic  of  search  which  has  been 
found  Is  a  paper  by  Falrthorne,  “The  Mathematics  of  Classification."^ 

The  work  of  H.P.  luhn  with  indexing  systems  and  the  Luhn  Scanner 
also  began  In  the  late  '40's.  In  a  paper  prepared  in  1948  he  concerned 
himself  with  retrieval  of  material  by  any  combination  of  terms  coded  on 
a  card  and  also  with  the  develbpment  of  free  field  coding  for  such  terms. 

In  addition  to  this  early  work,  he  made  a  major  contribution  to  coordinate 
indexing,  namely/'KWIC"  Indexing,  which  is  discussed  in  the  conclusion  of 

this  section. 

The  work  of  Documentat ion  Incorporated  in  this  area  began  In  1952, 
but  this  work  was  in  part  based  on  an  earlier  paper,  "The  Coordinate 
Indexing  of  Scientific  Fields,"  read  by  Mortimer  Taube  before  the 
Symposium  on  Mechanical  Aids  to  Chemical  Documentation  of  the  Division 
of  Chemical  Literature  of  the  American  Chemical  Society,  September  4,  1951, 
and  a  technical  report  prepared  by  Irma  Wachtel  In  1951  for  the  Atomic 
Energy  Conwission  (TIO-469).  The  paper  on  "The  Coordinate  Indexing  of 
Scientific  Fields"  was  stimulated,  In  turn,  by  Gilbert  King's  paper, 
"Application  of  Card  Methods  to  Scientific  Computation,"  especially  the 

following  passage: 

"The  capabilities  of  the  machines  could  be  learned  by 
examples,  but  it  would  be  much  more  satisfactory  If  the  algebra 
of  the  machines  were  worked  out.  The  scientist  in  deciding 
how  to  solve  a  problem  and  the  detailed  programming  of  number 
(|.e«,  cards,  through  the  machines)  Is  not  ...  Interested  in 
the  circuitry,  and  he  would  only  be  unnecessarily  limited 
by  examples.  He  needs  facts,  namely,  what  can  be  fed  into 
the  machine,  what  can  happen  to  this  Information,  and  what 
can  come  out.  The  machine  carries  out  operations  of  symbolic 
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logic,  and  for  proper  coding  end  programming  the  scientist 
should  know  the  basic  algebra  of  the  black  boxes  into 
which  he  will  put  cards.  At  present,  an  installation  of 
punched-card  machines  requires  a  supervisor  who  is  thoroughly 
acquainted  with  what  the  machines  can  do  and  equally  at  home 
with  the  science  of  the  problem.  It  would  be  much  more 
efficient  if  every  scientist  on  the  staff  knew  what  the 
machines  can  do.  This  can  be  accomplished  now  only  by  those 
who  learn  by  experience;  this  Is  a  long  process,  and  one 
which  requires  continual  practice  to  Insure  efficiency. 

On  the  other  hand,  if  the  algebra  or  symbolic  logic  were  3 
formulated,  it  could  be  rapidly  learned  and  readily  applled.M 

In  the  1951  paper,  "The  Coordinate  Indexing  of  Scientific  Fields," 

the  utility  of  machines  which  can  manipulate  classes  of  items  In  order 

to  determine  Boolean  functions  of  such  classes  is  made  to  rest  upon  a 

hypothetical  statement.  This  hypothesis  was  stated  as  follows: 

"If  any  given  scientific  field  can  be  analyzed  into  simple 
terms  or  Ideas  so  that  all  or  most  of  the  concepts  of  the 
field  can  be  expressed  In  these  terms  or  In  truth  functions* 
of  them  (l.e.,  logical  combinations)  then  mechanical  aids 
can  make  important  contributions  to  organizing  and  searching 
for  Information  in  such  a  field...  In  order  to  See  more 
clearly  why  this  is  true,  we  can  turn  now  to  a  more  formal 
presentation  of  the  nature  of  coordinate  Indexing  for 
machines...  In  a  field,  d,  with  terms  a,  b,  c,  etc., 
coordinate  indexing  allows  only  the  relations 

a 

not  a 
a  and  b 
a  or  b."  ® 


*  For  further  development  of  these  Ideas,  see  the  distinction  between 
logic  and  mechanics  of  retrieval  In  "Studies  In  Coordinate  Indexing, 
Vol.  III. 
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"(In  terms  of  machine  searching,  If  we  search  for 
'a  or  b1  we  will  get  everything  that  Is  'a  and  not 
b',  everything  that  Is  both  'a  and  b',  The  search 
wl 1 1  reject  only  those  things  which  are  'not  a  and 
not  b'Trio 

The  Initial  Uniterm  experiment,  conducted  by  Documentation 
Incorporated  for  ASTIA,  was  an  attempt  to  determine  whether  or  not  a 
specific  body  of  literature  could  be  indexed  by  assigning  to  each 
document  In  the  collection  a  subset  of  a  set  of  terms  with  the  re¬ 
striction  that  a  search  for  any  document  could  only  be  made  by  a 
logical  operation  on  the  terms  of  the  subset.  The  hypothesis  expressed 
In  the  1951  paper  was  repeated  In  a  series  of  lectures  given  at  the 
University  of  Chicago  In  the  Spring  of  1952.* 

Host  of  the  early  criticism  of  the  pure  system  of  coordinate  In¬ 
dexing  was  based  on  the  failure  to  recognize  Its  experimental  or  hypo¬ 
thetical  nature.  It  was  always  recognized  that  the  elimination  of 
syntax  and  the  restriction  of  relations  between  terms  to  pure  Boolean 
functions  might  lead  to  some  degree  of  noise  In  the  retrieval  process. 
The  Initial  experiments  were  designed  to  find  out  how  much  noise. 

In  considering  the  subsequent  history  of  coordinate  Indexing, 
with  Its  Introduction  of  various  types  of  constraints,  such  as  pre- 


*  Since  the  1951  paper  was  unpublished,  Dr.  Shera,  with  the  knowledge 
and  approval  of  the  author,  presented  the  hypothesis  In  a  paper  given 
at  a  symposium  at  Columbia  University  In  1953.  It  Is  Interesting  to 
note  that  In  a  paper  by  Jack  Morris,  published  In  American  Documen¬ 
tation  In  195^,  which  In  general  Is  most  critical  of  the  Uniterm  System, 
the  same  hypothesis  is  stated  with  complete  approval.! I 
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coordinations,  roles,  links,  categorization  of  terms,  etc.,  it  is 
Important  to  realize  that  in  its  Initial  presentation,  the  theory  was 

t 

presented  In  the  pure  form,  that' is,  with  maximum  freedom  among  the 

terms.  It  was  felt  that  a  system  with  this  degree  of  freedom  was  necessary 

to  provide  compatibility  between  the  differently  structured  CADO  system 

and  the  TID  system,  which  were  to  be  combined  in  the  then  recently 

established  ASTIA.  (Cf.  Section  5.5*)  The  use  of  Boolean  algebra  to 

describe  the  relationships  among  the  terms  in  a  pure  system  of  coordinate 

indexing  is  not  accidental,  nor  even  pedantic,  as  has  been  supposed  by 
12 

some  critics.  It  was  used  to  emphasize  the  possibility  of  constructing 
an  indexing  system  which  would  dispense  with  syntactical  relationships 
of  any  kind  among  terms  and  which  would  restrict  Itself  to  the  class 
operations  allowed  in  a  Boolean  algebra.  For  example,  even  the  noncom- 
mutative  relationship  of  ordered  pairs  which  can  be  handled  in  an  extended 
theory  of  classes  was  ruled  out  of  pure  coordinate  Indexing  on  the  grounds 
that  such  relationships  are  outside  the  domain  of  a  restricted  Boolean 
algebra.  Since  the  theory  was  presented  in  a  pure  form,  it  followed 
that  all  subsequent  innovations  could  not  be  in  the  direction  of  making 
the  system  purer,  i.e..  freer,  but  in  adding  constraints  of  one  kind  of 
another.  (See  Paragraph  3.7,  on  Luhn's  keyword- in-context  Indexing.) 

Before  going  on  to  consider  the  history  of  attempts  to  reintroduce 
various  types  of  constraints  into  coordinate  indexing,  It  is  important 
to  point  out  the  relationship  between  certain  conclusions  in  infor¬ 
mation  theory  and  coordinate  indexing.  It  must  be  stated  that  although 
these  relationships  were  not  immediately  evident  during  the  early  days 
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of  the  work  in  coordinate  indexing,  they  were  noted  subsequently 
and  used  to  throw  light  on  the  advantages  of  pure  coordinate  Indexing. 
3.5  Information  Theory 

Calvin  Wooers  and  others  had  noted  that  certain  conclusions  of 
information  theory  were  relevant  to  coding  problems,  especially 
problems  of  superimposed  coding.  Formulas  can  be  used  to  determine 
the  optimum  depth  of  superimposed  indexing  In  limited  coding  fields  on 
cards  which  are  analogous  to  formulas  used  in  calculating  the  amount 
of  information  which  can  be  transmitted  through  limited  channels.  The 
relationship  between  Information  theory  and  coordinate  indexing  noted 
by  Documentation  Incorporated  was  not  this  relationship  between  channel 
capacity  and  coding,  but  between  questions  of  maximizing  Information 
and  the  free  form  of  coordinate  Indexing.  In  other  words,  the  hypo¬ 
thesis  that  an  Indexing  system  could  be  based  upon  Boolean  relations 
of  classes  Is  not  ad  hoc.  Rather,  it  arises  from  the  fact  that  such  a 
system  is  In  principle  the  simplest  and  most  economic,  providing  only 
that  it  does  not  lead  to  too  much  noise.  An  example  can  Illustrate 
what  Is  meant  here:  Suppose  a  coordinate  Indexing  system  to  consist 
of  a  vocabulary  of  2,000  terms,  and  let  it  be  supposed  that  In  using 
such  a  system  for  indexing  and  searching,  it  Is  always  discovered  that 
the  amount  of  noise  delivered  by  any  search  is  negligible,  it  would 
obviously  be  redundant  to  set  up  In  such  a  system  additional  terms 
which  consisted  of  pre-coordinations  of  terms  selected  from  the  basic 
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vocabulary.  I|t  must  be  noted,  however,  that  as  In  Information  theory, 
If  any  constraint  Is  definitely  predictable,  such  a  constraint  can  be 
used  In  the  design  of  a  coordinate  Indexing  system  and  will  lead  to  a 
more  efficient  system  without  Increasing  costs.  For  example,  If  In 
any  system  there  were  5,000  documents  on  guided  missiles  and  no  docu¬ 
ments  on  ungulded  missiles  and  no  documents  on  guidance  of  anything 
other  than  missiles,  It  would  cerralnly  be  silly  to  set  up  a  term  for 
gulded"and  a  term  for  "mlssi  les". This  conclusion  has  been  expressed 
symbolically  In  a  discussion  of  what  terms  should  be  free  and  what 
terms  not  free  In  a  pure  system  of  coordinate  Indexing,  ^ 

Aside  from  this  relationship  to  the  theoretical  conclusions  from 
Information  theory  and  conclusions  derived  from  the  simple  elegance  of 
Boolean  algebra,  undoubtedly  one  of  the  motivating  factors  behind  the 
hypothesis  was  the  recognition  that  In  ordinary  English  the  order  of 
words  is  more  often  redundant  than  not,  It  was  recognized  that  In 
ordinary  discourse,  dispensing  with  word  order  and  prepositions  would 
certainly  lead  to  inelegances  and  difficulties  in  the  transfer  of 
meaning.  On  the  other  hand,  It  was  supposed  that  for  retrieving  In¬ 
formation  from  a  file,  order  and  prepositional  or  syntactical  relations 
could  be  dispensed  with  at  considerable  saving  without  appreciably 
Increasing  the  noise  in  the  system, 

3.6  Introduction  of  Constraints 


Various  subsequent  Innovations  in  the  system  of  pure  coordinate 
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Indexing  in  effect  represent  various  types  of  constraints.  Some  of 
these  constraints  were  the  subject  of  experimentation  by  Documentation 
Incorporated.  Similar  and  additional  constraints  have  been  suggested 
by  other  organizations  and  have  been  described  In  the  literature  which 
has  appeared  on  this  subject.  It  Is  this  literature  which  is  set  forth 
In  the  appended  annotated  bibliography.  The  constraints  have  been 
variously  named  and  described: 

(I)  Categorization 

The  categorization  of  terms  in  a  vocabulary  Involves 
the  division  of  a  vocabulary  under  a  number  of  generic 
headings.  These  generic  headings  can  be  thought  of 
as  supplying  a  grid  or  a  set  of  questions  put  to  any 
document  in  order  to  Insure  complete  and  uniform 
description  of  each  document.  Documentation  Incorporated, 
as  a  result  of  Its  research,  concluded  that  such  categories 
were  only  useful  in  limited  and  carefully  defined  fields 
of  Information  or  in  data  processing,  as  distinct  from 
information  handling,  systems.  In  its  own  experience 
It  has  successfully  used  categorization  in  three  projects: 

The  Nuclide  Index,  Project  ECHO  for  the  Air  Force  Office 
of  Scientific  Research,  and  the  Cancer  Chemotherapy  data 
processing  system  for  the  National  Institutes  of  Health. 


Other  descriptions  of  categorization  are  also  included 
in  the  bibliography. 

(2)  Pre-Coordination 

The  particular  distribution  of  documents  in  any  collection 
might  indicate  that  certain  terms  initially  treated  as 
separate  terms  should  be  pre-coordlnated  In  order  to  cut 
down  the  time  requirements  for  searching.  This  device  of 
pre-coordination  has  been  alluded  to  above  in  discussing 
the  use  of  the  term  "guided  missiles",  rather  than  the 
terms  "guidance"  and  "missiles".  This  technique  can "be 
followed  as  far  as  experience  warrants.  A  pre-coordination 
essentially  creates  a  more  specific  term  than  the  terms  which 
make  It  up.  Being  more  specific,  it  has  the  effect  of  adding 
another  term  to  the  vocabulary  and  cutting  down,  by  so  doing, 
the  average  number  of  postings  on  each  term  In  the  vocabu¬ 
lary.  At  the  opposite  end  of  the  spectrum,  experience  may 
disclose  a  number  of  highly  specific  terms  with  a  very  low 
Incidence  of  posting.  Periodic  examination  of  such  terms 
may  Indicate  several  possibilities: 

(A)  The  terms  may  represent  synonyms  for  other 
terms  In  the  system  and  can  be  eliminated  in 
favor  of  cross-references.  Duplicate  posting 
on  synonyms  may  also  be  employed. 

(B)  The  terms  may  be  important  terms  which 

just  happen  to  be  relevant  to  only  a  few  documents 
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In  the  system.  If  so  they  must  be  retained, 
even  though  they  represent  an  Increase  In  the 
vocabulary  size. 

(C)  Finally,  lightly  posted  terms  may  represent 
oblique  svnonvms  which  create  a  problem  for  the 
systems  designer.  He  must  determine  whether 
they  are  useful  enough  to  be  retained  or  whether 
they  can  be  eliminated  In  favor  of  combining 
their  postings  with  the  postings  on  other,  more 
heavily  used  terms  In  the  system. 

It  should  be  apparent  that  In  a  system  of  coordinate  Indexing 

the  danger  points  are  the  two  extremes,  namely,  terms  with  a 

great  many  postings,  and  terms  with  too  few  postings.  It  can 

be  assumed  that  In  the  middle  of  the  spectrum, the  terms  which 

have  been  used  frequently  by  the  Indexer  will  also  be  those 

terms  used  frequently  by  the  searcher,  since  In  both  cases 

they  represent  terms  used  frequently  in  the  literature.  The 

systems  designer  must,  then,  continually  attend  to  the 

extremes  as  his  system  grows. 


(3)  The  Association  of  Ideas  and  Thesauri 

Quite  early  In  Its  work  Documentation  Incorporated  realized 
that  a  free  vocabulary  might  present  certain  problems  to  the 
searcher  unfamiliar  with  the  vocabulary.  This  problem  can 
be  handled,  In  part,  in  any  system  which  employs  categories 
because  each  category  can  be  displayed  to  the  searcher,  If 
categories  are  not  employed,  the  searcher  faces  the  problem 
of  knowing  what  terms  to  ask  for.  For  example,  a  searcher 
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approaching  the  system  with  any  particular  term  In  mind 
may  not  know  what  terms  to  coordinate  with  It,  that  Is, 
what  terms  In  general  have  been  used  with  any  given  term 
In  analyzing  Items  in  a  collection.  As  an  example  of  how 
this  problem  might  be  solved,  a  device  known  as  "EOIAC" 
(Electronic  Display  of  Indexing  Association  and  Content) 
was  created.  This  device  consisted  of  an  electrified  panel 
having  a  vocabulary  and  a  set  of  numbers.  If  any  word  in 
the  vocabulary  was  selected  as  the  beginning  of  a  search, 
there  would  also  be  displayed  on  the  panel  the  logical  sum 
of  all  other  words  used  In  Indexing  any  document  Indexed  by 
the  word  first  entered  Into  the  device.  This  display  would 
enable  the  searcher  to  select  an  additional  word  to  coordinate 
with  the  first  word.  He  could  do  this  with  assurance  that  the 
system  contained  at  least  one  document  indexed  by  both  terms. 

This  display  of  the  vocabulary  and  its  connections  has 
some  relationship  to  the  process  of  redirecting  a  search 
as  a  result  of  browsing  in  a  manual  catalog.  An  initial 
search  In  a  manual  catalog  might  disclose  not  only  a 
number  of  items  but  also  give  information  to  the  searcher 
concerning  what  other  terms  he  should  use  In  further  steps 
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in  his  search.  The  display  of  vocabulary  on  the  EDIAC 
does  exactly  this.  At  least  two  thesauri  which  have 
been  compiled  are  Intended  to  provide  Information  about 
the  relationships  of  terms  In  an  Indexing  system. 

These  are  the  ASTIA  and  A.I.Ch.E.  thesauri.  Closely 
related  to  thesauri  are  semantic  or  generic  codes.  WRU 
uses  the  former  and  the  U.S.  Patent  Office  has  studied 
the  latter.  These,  as  well  as  the  thesauri  mentioned 
above,  are  described  In  greater  detail  In  Paragraph  5.2 
of  this  report. 

(4)  The  Use  of  Roles  and  links 

Quite  early  In  the  development  of  free  forms  of  coordinate 
Indexing  which  utilized  only  the  logical  functions  of  Inter¬ 
section,  union,  and  negation  as  connections  among  terms, 

It  was  realized  that  the  search  of  such  systems  would 
deliver  a  certain  amount  of  noise  or  false  answers. 

Consider,  for  example,  a  document  containing  Information 
on  "lead  as  a  coating"  and  another  document  on  "coatings 
for  lead".  If  the  system  has  terms  for  only  "lead"  and 
coating",  the  system  will  deliver  noise.  It  Is  to  avoid 
problems  of  this  kind  that  there  have  been  advanced  in  the 
past  years  the  devices  known  as  role  indicators  and  links 
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as  a  means  of  supplying  syntactical  correctives  In 
addition  to  logical  correctives  in  an  indexing  system. 

Roles  and  links  are  described  in  greater  detail  In 
Section  6  of  this  report, 

3 . 7  Conclusions 

The  development  of  a  discipline  sometimes  resembles  an  ascending 
spiral,  rather  than  a  straight  line,  and  In  this  development  of 
constraints  in  coordinate  Indexing,  there  has  been  In  some  sense  a 
return  to  the  beginning. 

H.P.  Luhn  has  suggested  a  method  of  extracting  keywords  from 

a  document  and  displaying  them  In  an  Index  In  a  context  of  other 
14 

terms.  The  context,  In  this  case,  constitutes  a  constraint,  that 
Is,  a  method  of  narrowing  the  significance  of  the  keyword  and  making 
it  more  specific.  At  the  same  time,  It  must  be  recognized  that  the 
selection  of  keywords  by  machine  matching,  as  opposed  to  the  use  of 
Unlterms,  descriptors,  etc.,  by  an  Indexer,  may  be  considered  an 
advance  in  the  direction  of  freedom.  The  original  form  of  coordinate 
Indexing  certainly  envisioned  that  the  terms  are  selected  not  by  a 
pure  matching  process  or  on  the  basis  of  any  statistical  counts, 
but  by  trained  indexers  who  use  their  own  background  and  understanding 
of  the  meaning  of  the  document  as  determinants  of  the  terms  selected. 

To  substitute  for  this  rational  process  a  process  of  machine  matching 
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based  upon  statistical  counts  or  other  considerations  relevant  to 
Information  theory  which  can  be  programmed  Into  a  machine  Is  to  go 
beyond  the  original  form  of  coordinate  Indexing  in  the  direction  of 
"digitalizing"  Information.  In  other  words,  the  original  form  of 
coordinate  Indexing  depends  upon  formal  considerations  only  for  the 
matching  of  terms  and  relies  on  intuitive  considerations  for  their 
selection  or  creation.  Keyword- in-context  indexing  relies  on  mechanical 
means  for  selecting  the  terms  themselves  from  titles.  (It  should  be 
noted  here  that  the  KWf C  technique  is  presently  more  useful  as  a  quick 
and  convenient  announcement  technique  for  relatively  small  numbers  of 
accessioned  documents  than  as  an  accumulative  index  useful  as  a  tool 
for  detailed  Information  retrieval.  However,  as  authors  recognize  the 
need  for  and  construct  more  informative  titles,  the  tt/IC  technique 
might  have  broader  capabilities.)  Nevertheless,  by  showing  how  the 
context  of  terms  can  fulfill  the  same  role  as  other  more  artificial 
syntactical  constraints,  Luhn  moves  coordinate  indexing  back  in  the 
direction  of  freedom  —  the  direction  In  which  Information  theory 
indicates  it  should  go. 

Unfortunately,  it  Is  now  fashionable  in  the  field  of  Information 
retrieval  to  deny  the  relevance  of  Information  theory  because  its 
application  is  limited  to  engineering  and  operating  problems  in  the 
I.R.  field.  The  bulk  of  present  research  In  the  information  retrieval 
field  seems  to  be  concentrated  in  the  domains  of  linguistics  and 
semantics,  even  though  no  less  an  authority  than  Quine  has  warned 
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that  the  search  for  the  meaning  of  meaning  carries  an  Investigator 
beyond  positive  science  Into  the  arcane  field  of  metaphysics* 


*Quine,  Willard  Van  Orman,  From  a  Logical  Point  of  View,  Cambridge, 
Harvard  University  Press,  1953.  "Lexicography  Is  concerned,  or 
seems  to  be  concerned  with  Identification  of  meanings,  and  the 
Investigation  of  semantic  change  Is  concerned  with  change  of 
meaning.  Pending  a  satisfactory  explanation  of  the  notion  of 
meaning,  linguists  In  semantic  fields  are  In  the  situation  of 
not  knowing  what  they  are  talking  about." 
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VOCABULARY  GENERATION 


4. 1  General 

The  vocabularies  of  retrieval  terms  employed  by  various 
operators  of  coordinate  Indexing  systems  have  evolved  In  several 
ways.  When  the  Initial  development  of  these  vocabularies  Is  considered, 
however,  the  modi  operand  I  are  found  to  fall  Into  only  a  few  broad 
classes,  and  the  seeming  variations  In  developmental  techniques 
within  the  few  broad  classes  are  seen  to  be  merely  matters  of  degree. 
These  variations  In  degree  have  resulted  from  different  environmental 
and  capability  factors  with  which  the  system  developers  have  had  to 
contend.  For  example,  adequate  funding  and  development  time  permit 
much  more  vocabulary  refinement  and  "editing"  to  be  undertaken  than 
do  Inadequate  funding  and  timing.  Similarly,  personnel  experienced 
in  coordinate  techniques  will  (given  the  same  time  and  money)  develop 
more  refined  vocabularies  than  will  Inexperienced  personnel.  Thus, 

In  many  (probably  most)  Instances  .of  vocabulary  development ,  the 
degree  of  refinement  Initially  achieved  depended  only  Indirectly 
upon  a  knowledgeable  desire  for  adequate  refinement  (l.e.,  upon 
rationally  chosen  qoals)but  more  often  upon  comparatively  unrelated 
environmental  factors.  This  statement,  of  course,  Is  almost 
Impossible  to  prove;  system  operators  are  understandably  loath  to 
admit  fully  the  effects  of  the  exigencies  of  their  situations  during 
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the  embryonic  stages  of  system  development. 

As  noted  above,  coordinate  vocabularies  have  been  developed  via 
only  a  few  basic  routes:  (1)  empirically,  (2)  from  treatment  of 
existing  subject  heading  lists,  or  (3)  from  treatment  of  existing 
classification  schemes.  These  three  methods,  and  some  variations, 
are  described  below. 

4.2  Empirical  Development  of  Vocabularies 

The  empirical  development  of  vocabularies  Is  typically  based 
upon  the  technique  of  "free  Indexing  ">  By  "free  Indexing"  Is  meant 
the  assignment  as  Index  terms  (for  a  given  document)  of  words  or 
phrases  chosen  by  the  Indexer  (based  on  judgment  of  relative  importance) 
from  the  set  of  words  or  phrases  conceived  by  the  indexer  to  be  appro¬ 
priate  Indexing  terms  even  though  they  may  not  have  been  employed 
in  the  document  by  the  author,  l.e.,  words  or  phrases  expressing 
what  the  indexer  considers  to  be  the  true  meaning  of  the  Ideas 
expressed  In  the  report. 

The  set  of  terms  resulting  directly  from  "free  indexing"  a 
collection  of  documents  can  be  considered  as  the  most  rudimentary 
form  of  a  coordinate  index  vocabulary;  such  a  set  of  terms  has 
usually  been  considered  as  preliminary  In  that  a -certain  amount  of 
refinement  effort  has  nearly  always  been  expended  upon  the  "raw" 
vocabulary.  For  example,  attempts  (varying  In  their  orderliness 
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and  effectiveness)  have  usually  been  made  to  detect  synonyms  and  to 
create  "see"  references  or  their  equivalent. 

The  Initial  development  of  the  du  Pont  Engineering  Department 
vocabulary  Is  typical  (perhaps  slightly  "better"  than  typical)  of 
initial  vocabulary  development  efforts.  This  project  was  well 
documented  shortly  after  the  work  had  been  completed.* 

The  report  on  the  du  Pont  development  implies  several  Important 
environmental  considerations.  First,  funding  was  adequate  for 
the  mounting  of  a  major,  concerted  effort.  Second,  the  time  permitted 
for  completion  of  the  work  was  limited.  Third,  the  personnel  Involved 
in  the  effort  were  largely  Inexperienced  in  coordinate  techniques, 
although  they  did  engage  the  advice  of  a  firm  in  the  documentation 
field.  Briefly,  the  development  proceeded  as  described  below. 

A  complete  subcollection  of  2,100  reports  was  chosen  for  test 
because  of  both  breadth  of  technological  coverage  and  Importance  of 
the  reports  —  the  particular  subcollection  having  had  codes 
assigned  to  Its  members  indicating  such  relative  Importance.  Indexing 
was  carried  out  mostly  from  the  report  summaries  (two  to  five  pages 
in  length)  but  in  a  few  Instances  from  abstracts  only.  The  report 
states  that  when  a  phrase  was  chosen  as  an  Indexing  term,  each 
word  of  the  phrase  was  also  used  as  an  entry  (e.g.,  "shielded  arc 
welding",  "arc  welding  M,  "welding")  although  examination  of  tracing 
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sheets  Indicates  that  all  Indexers  may  not  have  followed  this  rule 
consistently. 

i 

Thus  far,  the  du  Pont  development  was  typical  of  the  empirical 
method.  The  next  step,  however,  employed  data  processing  equipment 
not  available  to  many  system  developers.  Accordingly,  many  system 
developers  were  resigned  at  this  point  to  posting  on  Uniterm  cards 
manually  from  the  original  tracings,  to  post  each  entry  on  each 
tracing  on  the  appropriate  Uniterm  card,  and  then  to  examine  the 
result  for  synonyms.  At  du  Pont,  however,  the  tracings  were  key¬ 
punched,  one  card  being  made  for  each  entry  on  the  tracing.  The 
28,000  resulting  cards  were  then  sorted  alphabetically  and  numerically 
6y  machine.  They  were  listed  alphabetically  by  Index  term  and  by 
report  number  within  Index  term.  It  was  found  that  9,400  "  unique  " 
Index  terms  had  been  employed  by  the  Indexers. 

The  listing  was  then  examined  to  permit  combination  of  terms 
which  should  have  been  Identical  except  for  misspellings  and  Incon¬ 
sistent  assignment  of  singular  and  plural  forms,  etc.  After  this, 
each  term  In  the  listing  was  transferred  manually  to  a  5"  X  8" 

Uniterm  card;  the  vocabulary  was  thus  reduced  to  about  4,000  terms, 

The  Index  was  then  edited.  "See"  and  "see  also"  references 
were  created.  When  the  word  components  of  phrases  could  be  coordin¬ 
ated  conveniently  and  apparently  without  "noise",  the  multiword 
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terms  (i.e.,  phrases)  were  deleted  and  their  postings  transferred  to 
the  appropriate  single-word  term  cards.  Certain  of  the  original  terms 
were  removed  altogether  (e.g,  terms  with  very  broad  meanings). 
Hierarchical  (or  generic)  posting  from  one  term  to  a  broader,  more 
inclusive  term  was  not  consciously  employed. 

The  final  result  was  a  vocabulary  of  2,667  terms  and  "see" 
references.  The  number  of  "see"  references  or  terms  alone  was  not 
reported. 

The  effort  represented,  insofar  as  can  be  determined,  a  somewhat 
better  than  typical  initial  effort  and  one  which  is  helpful  In 
illustrating  the  Initial  generation,  via  empirical  means,  of  a 
vocabulary  for  use  In  further  coordinate  indexing  efforts. 

4.3  Development  of  Vocabularies  from  Subject  Headings 

A  number  of  coordinate  vocabularies  have  been  developed  from  pre¬ 
existing  subject  heading  lists.  Ignoring  the  reasons  for  Instituting 
coordinate  indexing  systems  at  all,  at  least  two  factors  appear 
operative:  (l)  a  relatively  large  document  collection  exists  which 
has  already  been  subject  cataloged,  but  which  would  have  to  be 
re-indexed  at  considerable  cost  should  the  empirical  method  be  employed, 
and/or  (2)  the  system  developers  have  a  familiarity  with  and  con¬ 
fidence  In  the  subject  heading  list,  as  well  as  a  natural  desire  to 
preserve  Its  advantages. 
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Whatever  the  reasons  for  choice  of  the  "converted  subject  heading 
approach",  however,  the  method  seems  to  have  much  to  commend  It  from 
a  vocabulary-building  viewpoint.  The  terminology  Is  likely  to  reflect 
not  only  the  technological  scope  of  the  collection  to  a  fair  degree 
but  also  a  certain  amount  of  standardization.  Semantically  imprecise 
terms  (e.g.,  "columns")  are  likely  already  to  have  been  detected  and 
treated  In  some  fashion,  either  implicitly  or  explicitly.  A  careful, 
thoughtful  conversion  process  can  retain  these  advantages  as  well  as 
avoid  the  re-indexing  of  any  pre-existing  collection  (although  it  is 
probable  that  the  converted  index  will  be  less  effective  for  the 
pre-existing  collection  than  for  new  accessions). 

One  of  the  earliest  studies  In  converting  subject  headings  to 

Unlterms  is  described  in  a  paper  by  Thomas  and  Gull.^  In  the 

experiment,  the  subject  headings  In  the  List  of  Subject  Headings 
of  the  Technical  Information  Division  (TID)  of  the  Library  of 
Congress  and  the  Subject  Heading  List  of  the  Document  Service  Center 

(DSC)  of  the  U.S.  Armed  Services  Technical  Information  Agency  were 

converted  to  Unlterms.  (Since  their  merger  in  March  1953, 
these  activities  have  been  under  the  general  direction  of  ASTIA.) 

The  DSC  Subject  Heading  List  contained  about  10,380  headings 
listed  alphabetically  by  the  inverted  method  in  contrast  to  the  TID 
list  of  more  universal  headings.  This  difference  was  the  most  serious 
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Impediment  to  a  merger  of  DSC  and  TID  catalogs  by  subject  headings 
since  such  a  merger  would  have  required  changing  one  set  of  headings 
to  conform  to  the  other.  The  TID  list  of  Subject  Headings  contained 
approximately  25,000  subject  headings  and  24,000  cross  references  In 
one  alphabet.  Subject  headings  contained  one  or  more  words  and  were 
followed  by  slngld  or  multiple  subdivisions,  The  cross  references 
provided  for  various  relationships  between  words  and  phrases. 

When  the  conversion  of  the  DSC  and  TID  lists  had  been  completed, 
the  terms  and  cross  references  comprised  the  following  files: 

1.  One  alphabet  of  3607  unit  terms  converted  from 
the  TID  list;  2287  terms  were  common  to  both 

lists  and  1320  were  derived  from  the  TID  list  only. 

2.  One  alphabet  of  3275  unit  terms  converted  from 
the  DSC  list  only. 

3.  Approximately  720  synonymous  references  from 
the  TID  list. 

4.  Approximately  100  references  from  the  TID  list 
considered  unnecessary. 

5.  Approximately  800  references  from  the  DSC  list, 
not  separated  as  3  and  4  above. 

The  five  flies  listed  above  were  completely  edited  before  their 
merger  Into  one  alphabet  of  Uniterm  headings.  The  final  product  of 
conversion  was: 

1,  One  alphabet  of  6582  Uniterm  headings,  of  which 
1320  were  derived  from  the  TID  list,  2975  were 
derived  from  the  DSC  list,  and  2287  were  common 
to  both. 
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2.  One  alphabet  of  549  synonymous  cross  references 
(single  words,  abbreviations,  and  notations). 

3.  One  alphabet  of  2106  cross  references  considered 
unnecessary  (showing  specificity,  generalization, 
and  synonymity  In  phrases). 

Another  vocabulary  development  utilizing  the  "converted  subject 
heading  approach"  Is  that  of  ASTIA's  Project  Mars.*  The  principal 
headings  were  divorced  from  their  subdivisions  and,  In  one  move,  the 
list  was  reduced  from  70,000  to  about  8300  main  headings.  The 
850  subdivisions  were  reduced  to  about  700  by  the  removal  of  words 
such  as  "application"  that  were  no  longer  useful,  These  first 
steps  resulted  In  a  tentative  vocabulary  of  some  9000  terms. 

The  tentative  vocabulary  was  then  edited  In  a  fashion  similar 
to  that  described  above  In  the  du  Pont  example.  Synonyms  were 
detected  and  appropriate  "use"  (l.e.,  "see")  references  were  created. 
Also,  certain  only  slightly  used  terms  were  included  In  other 
closely  related  terms  via  the  "use"  reference  route  (e.g.,  "mlticldes 
use  acaricides") .  These  actions  reduced  the  number  of  terms  to  less 
than  7000. 

In  addition,  ASTIA  provided  "also  see"  (l.e.,  "see  also") 
references  among  terms.  Finally,  terms  were  assigned  to  groups 
which  were  somewhat  artificially  designed  according  to  the  organiza¬ 
tional  arrangement  of  ASTIA;  the  group  name  was  also  displayed 
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parenthetically  with  the  term  Itself  In  the  resulting  authoritative 
4 

publication.  The  cross  references  noted  above  were  also  displayed. 

Following  the  initial  vocabulary  development  and  as  a  foundation 

for  the  Institution  of  a  mechanized  retrieval  system  at  ASTIA,  the 

subject  heading  Index  was  simultaneously  converted  both  to  the  new 

coordinate  vocabulary  and  to  a  machine-readable  medium. 

Oespite  apparent  shortcomings  In  time  permitted  and  funding, 

the  ASTIA  conversion  represents  a  landmark.  As  contrasted  with  the 

experimental  conversion  of  the  ASTIA  vocabularies  by  Documentation 

Incorporated,  this  conversion  was  actually  used.  The  thesaurus  was 

known  to  be  incomplete  at  the  time  of  publication,  was  intended  only 

for  Internal  use  at  ASTIA,  and  has  been  undergoing  continuous  review 

by  the  ASTIA  staff.  (Section  5.2  contains  a  brief  review  of  the 

ASTIA  Thesaurus  revision  effort.) 

It  should  be  noted  that,  according  to  ASTIA,  the  Uni  terms 

developed  during  the  contract  with  Documentation  Incorporated  were 

employed  as  aids  in  defining  the  descriptors,  although  in  some  cases 

5 

they  became  descriptors. 

4.4  Development  of  Vocabularies  from  Classification  Systems 
Although  in  the  original  experimentation  by  Documentation 
Incorporated  a  certain  number  of  classes  were  converted  In  order  to 
demonstrate  the  method,^  the  conversion  of  classified  indexes  to 
coordinate  indexes  appears  to  have  occurred  only  infrequently.  This 
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is  probably  because  the  operators  of  hierarchically  classified 
systems  may  have  had  little  to  gain  by  converting  their  Indexes.  The 
coordinate  Index  which  would  result  would  require  more  effort  In 
retrieval  (l.e.,  manipulation  during  retrieval)  than  would  the  original 
tool  and  (unless  the  subclassification  structure  of  the  original  tool 
were  much  more  detailed  than  is  usually  the  case)  any  resulting 
coordinate  index  (barring  re-indexing  of  the  collection)  would  be 
Inadequate  because  the  individual  documents  would  originally  have 
been  "forced"  into  classes  and  the  fine  detail  of  Information  retrieval 
thereby  lost  at  the  outset.  Another  reason  for  the  paucity  of  such 
conversions  may  be  that  very  few  progressive  retrieval  systems  based 
principally  on  hierarchical  classification  actually  exist. 

The  picture  is  somewhat  different,  however,  with  respect  to 
faceted  classification  systems;  few  of  these  exist  (at  least  in  the 
U.S.A.).  Faceted  classification  systems  are  likely  to  have  captured 
at  the  outset  much  of  the  fine  detail  of  Information  contained  In 
accessioned  documents,  but  at  the  expense  of  convenience  In  notation 
and  file  arrangement.  Hence,  they  can  be  converted  advantageously  to 
coordinate  systems  because  the  manipulative  characteristics  of  the 
latter  can  be  employed  to  overcome  the  disadvantages  of  long  complex 
notations  and  complex  rules  for  file  arrangement. 


One  conversion  from  a  faceted  classification  to  a  coordinate 

8 

Index  has  been  described  by  Wadlngton,  Wadlngton  noted  that  the 
desire  for  extremely  detailed  Indexing  resulted  In  1949  In  the 
rejection  of  both  hierarchical  classifications  and  subject  headings 
as  candidates  for  retrieval  tools  In  the  reported  system,  Rather, 
a  faceted  classification  derived  from  Ranganathan's  colon  system 
was  tried  and  adopted.  He  reports  that  "although  the  component  parts 
of  the  (classification)  code  were  simple,  the  finished  codes  for 
Information  were  long  and  complex.  Manual  filing  by  complete  code 
was  ruled  out  because  It  was  too  complicated  and  subject  to  too  many 
errors." 

"Extensive  use  of  cross  referencing  was  next  considered.  A 
card  would  be  made  out  for  each  part  of  the  code.  While  this  would 
simplify  the  filing,  it  would  greatly  increase  the  number  of  cards 
to  be  processed."  (This  Is  assuming  about  eight  cross-index  cards 
per  document.) 

"  Punched-card  techniques  were  then  considered,  assuming  one 
punched  card  per  document;  It  was  found  that,  sorting  time  would 
run  as  high  as  16  hours  to  make  sure  that  all  the  Information  on  an 
average  question  was  retrieved." 

Finally  a  coordinate  Indexing  system  was  chosen  and  Installed  In 
1954.  Wadlngton  reports  that  excessive  noise  resulted,  This 
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d 1 f f lenity  was  remedied  by  converting  the  previously  developed 
faceted  classification  to  a  coordinate  system.  Apparently  each 
facet  or  code  component  was  converted  to  a  subject  term  and  the 
appropriate  document  numbers  posted  thereon. 

Wadington  noted  that  generic  terms  were  consciously  (and  easily) 
generated  from  the  original  classification  and  that  "development  of 
these  generic  relationships  would  have  been  virtually  Impossible  or 
at  best  exceedingly  difficult"  otherwise. 

4.5  Conclusions 

It  can  be  concluded  that  effective  coordinate  vocabularies  can 
be  developed  by  any  of  the  three  above  techniques.  The  empirical 
technique  Is  based  first  upon  "free  indexing",  and  the  results  are 
modified  (according  to  need)  to  provide  adequate  control;  thus,  the  cost 
of  vocabulary  development  is  normally  spread  over  a  greater  or  lesser 
period  of  time,  perhaps  at  the  expense  of  achieving  truly  satisfactory 
retrieval  of  the  first  documents  entering  the  system.  The  conversion 
of  a  subject  heading  list,  on  the  other  hand,  can  be  made  at  moderate 
Initial  cost  with  the  assurance  that  retrieval  effectiveness  will  be 
no  worse  than  that  provided  by  the  original  tool;  in  this  instance 
at  least  part  of  the  cost  of  development  of  the  subject  heading  list 
is  recovered.  The  same  is  true  of  conversion  of  classifications, 
particularly  faceted  classifications,  with  the  added  advantage  of 


. . .  in  imitM  pidmmumii  ii  . . 


being  able  easily  to  detect  generic  relationships  among  terms. 


iillllllll  . .  nil 


1 

I 


a 

o 

a 

ij 

u 


47 


MflCS!l£fi£ 


1.  Wall,  Eugene.  Use  of  Concept  Coordination  In  the  du  Pont 
Engineering  Department.  Conference  on  Multiple  Aspect  Searching 
for  Information  Retrieval,  Washington,  D.C.,  1957,  Armed  Services 
Technical  Information  Agency,  pp.  12-25. 

2.  Thomas,  Richard  B.  and  C.D.  Gull.  The  Choice  of  Terms  for  a 
Uniterm  Coordinate  Index  of  Scientific  and  Technical  Reports. 
Studies  In  Coordinate  Indexing,  Vol.  1,  pp.  47-55. 

3.  Heald,  J.  Heston.  Jhg  Automat  Ion  of  ASTIA,  December  1959, 

AD  227  000. 

4.  The  Thesaurus  of  ASTIA  Descriptors.  First  Edition,  May  i960. 

5*  see  3. 

6.  Gull,  C.D.  Alphabetic  Subject  Indexes  and  Uniterm  Coordinate 
Indexes,  An  Experimental  Comparison.  Studies  In  Coordinate 
Indexing,  Vol.  1,  pp.  56-64. 

7.  Wachtel,  Irma.  Classification  and  Categorization  In  Information 
Systems.  Studies  in  Coordinate  Indexing,  Vol.  1,  pp.  65-72. 

8.  Wadington,  J.P.  Modification  of  a  Multiple  Aspect  System  for 
Company  Use.  Conference  on  Multiple  Aspect  Searching  for 
Information  Retrieval,  Washington,  D.C.,  1957,  Armed  Services 
Technical  Information  Agency,  pp.  36-43. 


5.  CONTROL,  MODIFICATION  AND  GROWTH  OF  VOCABULARIES 

5.1  General 

5.2  Thesauri,  Authority  Lists,  and  Generic  Codes 

5.3  Editing  during  Indexing,  Posting,  and/or 
Retrieval 

5.4  Modifying  the  Vocabulary 

5.5  Compatibility 


49 


5.  CONTROL.  MODIFICATION  AND  GROWTH  OF  VOCABULARIES 
5.1  GENERAL 

The  effort  needed  for  controlling  vocabularies  (once  they 
have  been  developed)  and  for  modifying  and  adding  to  them,  would  seem 
to  be  equivalent  for  all  systems,  except  (as  noted  above)  that  conver¬ 
sion  of  a  classification  appears  to  provide  somewhat  better  control  of 

generic  relationships  from  the  outset. 

Control  of  a  coordinate  vocabulary,  once  it  has  been  devel¬ 
oped,  can  be  said  to  consist  of  two  facets:  (1)  the  proper  use  of 
the  established  vocabulary  throughout  the  system,  and  (2)  the  modifi¬ 
cation  of  the  vocabulary  to  conform  to  changing  characteristics  of 
inquiries  and  of  accessioned  documents.  Both  of  these  forms  of  control 
are  discussed  below. 

An  Increasing  number  of  operators  of  coordinate  systems  are 
coming  to  rely  upon  some  form  of  authority  for  controlling  the  use  of 
an  established  vocabulary.  Typically,  those  systems  which  achieved 
the  greater  degree  of  refinement  of  their  Ini t lal  vocabularies  are  also 
those  which  exercise  the  greater  degree  of  vocabulary  control  during 
use  of  their  vocabularies.  This  is  understandable  because  "vocabulary 
refinement"  and  "vocabulary  control"  are  closely  related  and  those 
environmental  factors  which  affect  the  exercise  of  one  of  these  two 
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functions  will  also  affect  the  exercise  of  the  other. 

Thus  the  control  during  use  of  coordinate  vocabularies 
varies  from  essentially  "no  control"  (l.e.,  Indexers  continue  to 
choose  terms  from  the  literature  without  regard  to  those  previously 
chosen  —  and  little  If  any  control  of  synonyms  or  even  word  forms 
Is  exercised  during  posting  of  the  tracings  to  the  Index)  to  rather 
complete  control,  l.e.,  where  an  authority  is  employed  either  by  In¬ 
dexers  or  by  editors  to  keep  the  terminology  used  consistent  with  the 
established  vocabulary.  (The  consensus  seems  to  reflect  the  knowledge 
that  the  best  Indexing  Is  performed  by  competent  personnel,  having 
both  a  subject  specialty  and  some  familiarity  with  the  preparation  of 
indexes.  When  such  a  combination  of  talents  exists  In  the  same  person, 
the  need  for  extensive  editing  becomes  less  Imperative.) 

Such  control  can,  of  course,  vary  In  Its  degree  of  rigidity. 

It  may,  at  a  minimum,  consist  of  Instructing  Indexers  to  employ  the 
word  forms  previously  accepted  and  established  by  a  simple  authority 
list  of  terms,  "see"  references  and  "see  also"  references  (e.g. , 
"sulphuric  acid"  see  "sulfuric  acid"  or  "distilling"  see  "distillation"). 
New  terminology,  when  encountered  by  an  indexer,  can  be  "flagged"  for 
review  at  least  to  Insure  that  synonymous  relationships  with  the 
established  vocabulary  are  not  Involved.  In  addition,  new  "see  also" 
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references  may  be  established  both  to  ajjig  from  any  accepted  new  terms. 
And,  In  practice,  these  actions  constitute  the  limit  of  exercisable 
control  using  a  simple  authority  list. 

Whether  or  not  structure,  i . e . ,  vocabulary  control,  should! 
be  introduced  Into  a  system  at  the  input  level  has  not  been  determined. 
An  argument  in  favor  of  at  least  minimum  constraints  at  the  Input  Is 
represented  In  the  following  statement  by  H.  P.  Luhn. 

"Excessive  editing  obviously  Increases  the 
likelihood  of  bias  due  to  current  interest,  ex¬ 
periences,  and  points  of  view.  In  consequence 
the  usefulness  of  a  system  will  be  reduced  as 
emphasis  and  Interest  change.  It  would  therefore 
appear  that  the  less  Information  is  classified 
and  contracted  at  the  input,  the  more  It  will 
lend  itself  to  dynamic  interpretation  at  the  out¬ 
put  phase. "1 

5.2  THESAURI.  AUTHORITY  LISTS.  AND  GENERIC  CODES 

An  Increasing  degree  of  interest  and  activity  In  using 
thesauri  (or  their  equivalents)  as  authorities  Is  apparent.  The  Index 
to  Current  Research  and  Development  in  Scientific  Documentation,  No.  9. 
lists  eight  references  to  organizations  working  with  or  developing 
thesauri.  There  may  be  others,  since  there  is  no  reference  in  the  Index 
to  the  work  on  thesauri  now  going  on  at  ASTIA. 

There  are  other  aids  to  indexers  that  are  employed  Internally  In 
the  working  systems  we  have  encountered.  Most  of  these  are  rather 
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informal  and  are  not  available  for  distribution. 

The  existence  of  a  thesaurus  provides  an  opportunity  for 
the  operators  of  a  system  to  exercise  control  and  consistent  use  of  the 
vocabulary  at  any  or  all  phases  of  system  operation:  Indexing,  posting, 
or  Inquiry  processing.  Synonyms  are  indicated  In  a  suitable  thesaurus, 
as  are  "up  and  down"  generic  relationships  and  other  relationships  of 
unspecified,  Indeterminate  or  variable  type  (i.e.,  "see  also's").  Thus 
a  thesaurus  may  serve  as  the  means  of  bringing  to  the  attention  of  the 
Indexer  those  vocabulary  terms  which  might  be  employed  in  making  a 
search  for  the  document  at  hand  and/or  to  the  attention  of  the  searcher 
those  vocabulary  terms  which  might  have  been  employed  by  the  Indexer 
in  describing  documents  pertinent  to  the  question  at  hand.  The  result 
Is  greatly  Improved  retrieval  effectiveness. 

The  WRU  Semantic  Code  and  the  U.  S,  Patent  Office  Generic 
Codes  fall  into  the  class  of  thesauri  because  they  have  Incorporated, 

In  some  way,  all  of  the  three  characteristics  mentioned  above.  They 
differ  from  more  conventional  thesauri  In  that  they  are  designed  to 
be  brought  Into  play  during  mechanized  posting  or  search  rather  than 
by  a  searcher  during  his  formulation  of  a  query  or  by  a  human  editor. 

The  creation  of  a  thesaurus  for  information  retrieval  purposes 
is,  however,  an  enormously  expensive  task  —  although  this  cost  may  for 
a  given  system  be  reduced  markedly  by  referring  to  thesauri  created 
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earlier  by  others  with  technological  Interests  sufficiently  similar 
to  that  of  the  system  In  question  such  that  much  of  the  earlier  work 
may  be  utilized  intact. 

There  Is  currently  a  great  deal  of  time  and  energy  being 
expended  In  the  general  area  of  thesaurus  preparation,  Although  there 
Is  far  from  universal  agreement  on  the  necessity  or  desirability  for 
such  thesauri,  many  Installations  would  probably  utilize  such  an 
authority  If  Its  coverage  were  extensive  enough. 

Agreement  Is  also  lacking  on  the  point  of  maximal  utility  of 
the  thesaurus:  Is  Its  prime  function  to  assist  the  Indexer  or  the 
searcher? 

Described  below  are  the  ASTIA  and  A.I.Ch.E.  thesauri.  Other 
thesauri  exist  or  are  being  developed,  but  these  represent  the  best 

efforts  to  date.  Also  described  below  are  Western  Reserve's  Semantic 

>v 

Code  and  the  Patent  Office's  Generic  Code. 

5.2.1  THESAURUS  OF  ASTIA  DESCRIPTORS 

The  Thesaurus  of  ASTIA  Descriptors  (First  Edition) ,  AST IA , 
i960,  entangled  Its  technical  vocabulary  (which  Is  extensive,  although 
oriented  largely  to  military  terminology)  In  multiword  terms  too  much 
to  permit  It  to  be  truly  useful  to  others  working  with  different  col¬ 
lections.  Further,  Its  "also  see"  references  are  limited,  and  it  does 
not  exhibit  generic  relationships.  However,  It  Is  understood  that  these 
shortcomings  will  be  at  least  alleviated  In  the  second  edition  now  being 

*for  information  on  expense,  see  "Cost  of  Generic  Coding,"  by 
Mortimer  Taube,  jUi  Studies  In  Coordinate  Indexing,  Vo  I .  Ill,  pp.  34-57 • 
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prepared,  The  preparation  of  the  first  edition  was  described  by 
J.  Heston  Heald  In  Section  III  of  the  previously  cited  ASTIA  report 
AD-227000.  This  description  serves  not  only  to  report  on  one  method  of 
thesaurus  development  but  also  on  how  to  build  a  thesaurus  upon  an 
earlier  subject  heading  list. 

5.2.2  CHEMICAL  ENGINEERING  THESAURUS 

The  Chemical  Engineering  Thesaurus.  I96I,  by  the  American 
Institute  of  Chemical  Engineers,  covers  an  extensive  technical  vocabu¬ 
lary  (albeit  somewhat  more  attentive  to  engineering  terminology)  and 
provides  extensive  synonymous ,  generic,  and  "related  term"  (l  .e.,  "see 
also")  cross-referencing.  Because  It  was  designed  to  be  used  In  con¬ 
junction  with  a  system  having  syntactical  controls,  however,  It  lacks 
definition  of  terminology  sufficient  to  permit  Its  use  by  others  with¬ 
out  some  effort.  For  example,  terms  standing  for  materials  (e.g., 

"water")  do  not  have  indicated  their  roles  played  in  the  system  — 
e.g.,  "water"  used  for  "cooling"  or  "water"  being  "cooled"';  the  editors 
of  this  work  Intended  that  the  indexers  add  an  appropriate  role  Indi¬ 
cator  to  such  terms. 

Nevertheless,  the  Chemical  Engineering  Thesaurus  seems  to 
be,  to  date,  the  nearest  approach  to  a  true  and  generally  useful  Infor¬ 
mation  retrieval  thesaurus.  It  was  first  developed  as  an  Internal 
work  by  du  Pont  and  the  development  has  been  described  In  detail  by 
B.  E.  Holm  and  L.  E.  Rasmussen. ^  This  description  Is  accompanied  by 
a  valuable  review  of  other  work  with  (or  related  to)  technical  thesauri 
and  by  an  exceptionally  complete  list  of  references  to  the  work  of  others. 
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5.2.3  WRU  SEMANTIC  CODE* 

The  terms  used  by  an  Indexer  are  converted  to  semantic  codes 
automatically  If  the  terms  are  already  In  the  system,  but  If  the  terms 
are  new,  editors  create  new  codes, 

The  purpose  of  the  semantic  code  Is  to  automatically  bring 
Into  play  all  applicable  aspects  of  a  term.  That  Is,  It  attempts  to 
cover  both  higher  and  lower  generic  levels  of  terms  as  well  as  to  en¬ 
compass  semantically  related  concepts. 

An  example  Is  the  word  "diamond"  coded  as  CERBj«CWRSj*PYPRSI028, 
which  may  be  Interpreted  as  "a  crystalline  form  composed  of  carbon  and 
characterized  by  hardness."  (The  codes  have  been  arbitrarily  limited 
to  four  factors.)  Any  one  of  the  factors  may  be  searched.  That  Is, 
the  Items  Indexed  by  "diamond"  would  be  retrieved  whether  the  question 
called  for  that  specifically  or  for  "things  composed  of  carbon", "hard 
things",  or'crystal line  forms". 

Actually,  CERB^CWRS^PYPRil028  Is  not  the  complete  code  for 
diamond,  since  under  the  principles  of  the  semantic  code  this  would  be 
true  not  only  of  diamonds  but  of  any  other,  say,  hard  crystals  of 
carbon.  To  specify  that  diamonds  and  only  diamonds  are  wanted,  a 
further  element  —  the  numerical  suffix  —  must  be  given.  This  Is  a 


^American  Documentation  will  publish  "A  Note  on  the  Evaluation  of  the 
WRU  Semantic  Code  as  an  Example  of  Generic  Coding,"  by  Mortimer  Taube, 
Documentation  Incorporated,  In  the  April  1962  Issue. 
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four-digit  figure,  the  first  numeral  of  which  Is  that  of  the  number  of 
factors  In  the  term.  The  other  three  are  those  peculiar  to  the  particu¬ 
lar  concept.  In  this  Instance  the  numerical  suffix  might  be  3001.  Note 
that  one  of  the  factors,  PYPR,  Is  followed  by  the  numerical  Infix  1028. 
This  1028  Is  the  Identifying  numerical  suffix  for  that  particular 
physical  property  (P-PR) ,  '‘hardness".  Its  use  as  a  numerical  Infix 
here  shows  that  the  particular  physical  property  characterizing  (Y) 
diamonds  Is  hardness.  Only  numerical  suffixes  beginning  with  1  can  be 
used  for  Infixes.  Since  2's,  3's,  and  4's  Indicate  that  the  code  for 
the  concept  has  more  than  one  factor,  their  use  as  Infixes  would  make 
It  Impossible  to  particularize  specific  concepts  within  the  generic 
framework  of  a  code. 

This  Is  further  explained  by  the  definitions  given  below. 

Semantic  factor.  By  this  term  Is  meant  the  separate 
units  of  a  code,  expressed  by  three  consonants.  In  RAML/RWHTyT0MSol002j* 
3679,  the  semantic  factors  are  R-ML,  R-HT,  and  T-MS.  Each  semantic 
factor  represents  one  of  a  number  of  highly  generic  concepts.  Together 
they  form,  as  It  were,  the  building  blocksof  the  code.  It  should  be 
noted  that  within  a  code  composed  of  more  than  one  semantic  factor, 
the  separate  semantic  factors  are  arranged  alphabetically  Ignoring  the 
Infixes. 

Infix.  By  this  term  Is  meant  certain  symbols  used  with 
the  semantic  factors  in  a  code.  In  RAHL^RWHT^TQMSq1002^3679,  the  Infixes 
are  A,  W,  Q,  and  1002. 

Alphabetic  infix.  By  this  term  Is  meant  the  Infixes 
represented  by  alphabetic  symbols.  They  show  the  analytic  relationships 
of  the  semantic  factors  In  which  they  appear  to  the  concept  represented 
by  the  code. 
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Numerical  Infixes.  By  this  term  Is  meant  the  Infixes 
represented  by  numerals  following  the  symbol  They  show,  where  used, 
a  degree  of  particularization  In  the  semantic  factor  to  which  they  are 
affixed.  Actually  every  semantic  factor  may  be  thought  of  as  possessing 
a  numerical  infix;  however,  only  In  certain  Instances  are  they  explicit, 
that  Is, they  actually  appear  In  the  code.  In  the  majority  of  Instances, 
they  are  Implicit,  that  Is,  they  represent  a  numerical  Infix  "1001" 
which  Is  not  actually  printed  out. 

Numerical  suffix.  By  this  term  Is  meant  the  particular¬ 
izing  number  assigned  each  Individual  code  to  distinguish  it  from  all 
other  codes  which,  though  they  represent  different  concepts,  contain 
the  same  semantic  factors. 

The  semantic  code,  then,  attempts  to  eliminate  the  manual  or 
machine  "see  also"  or  "see"  references.  Furthermore,  It  attempts  to 
Include  generic  levels  and  general  characteristics. 

The  code,  of  course*  varies  according  to  the  system  require¬ 
ments.  For  example,  In  some  systems  the  following  code  for  diamond  may 
be  more  valuable:  CERB)*CWRS^GUMHjtMANR;  this  may  be  Interpreted  as  a 
"mineral  In  crystalline  form  composed  of  carbon  and  used  as  a  gem". 

For  comparison,  the  A.I.Ch.E.  thesaurus  entry  for  "diamond" 

Is  shown  below. 

DIAMOND 
P0  Carbon 
RT  Abrasives 
RT  Crystal 

P0  »  Post  (also)  on,  and  RT  ■  Related  term 
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It  Is  obvious  that  It  Is  not  necessary  to  Include  all  possible 
meanings,  relations,  or  generic  levels.  For  example,  diamond  In  the 
sense  of  a  baseball  diamond  would  certainly  be  superfluous  In  a  metal¬ 
lurgical  system.  However,  whether  or  not  WRU's  four  factors  are  suffi¬ 
cient  and  whether  or  not  the  factors  chosen  best  suit  the  users'  needs 
are  both  open  to  debate. 

5-2.4  U,  S.  PATENT  OFFICE  GENERIC  CODING 

Many  organizations,  notably  the  Patent  Office,  have  proposed 
that  a  coordinate  Indexing  system  Include  certain  generic  relationships 
among  terms  In  the  system,  such  generic  relationships  to  be  provided  by 
codes  which  exhibit  such  relationships.  From  one  point  of  view,  a 
total  code  can  be  considered  as  a  term,  and  a  search  can  be  made  for 
the  members  of  the  classes  defined  as  logical  functions  of  such  codes. 
From  another  point  of  view,  a  code  can  be  considered  as  a  class  number, 
so  that  different  parts  of  the  number  can  provide  for  generic  searches 
on  different  levels.  For  example,  in  one  system  proposed  by  the  Patent 
Office,  the  established  generic  relationships  of  chemical  names  are 
replaced  by  a  system  of  generic  codes,  as  follows: 

Heterocyclic  Compounds  1313 

Para-N-benzene  Sulfoxy  1313-1512 

Azoles  1313-2512 


Th I  azoles 
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1313-2512-1423 
Oxazoles  1313-2512-1523 

It  Is  possible,  of  course,  to  provide  for  generic  searches  by  Indexing 
and  posting  on  several  levels  without  employing  numerical  codes.  On 
the  other  hand,  the  numerical  code  is  presumed  to  provide  automatic 
generic  Indexing  whenever  the  specific  term  Is  Indexed. 

5.3  EDITING.  DURING  INDEXING.  POSTING,  AND/OR  RETRIEVAL 

Vocabulary  control  (as  distinct  from  vocabulary  modification) 
can  thus  be  exercised  with  the  aid  of  some  device  such  as  an  authority 
list  or  preferably  a  thesaurus;  the  types  and  degree  of  possible  control 
have  been  described  above.  This  section  of  the  report  will  deal  with 
where  In  the  system  the  control  can  be  exercised  and  note  the  reasons 
why  different  systems  are  reported  to  exercise  control  at  different 
points.  The  control  can  be  Imposed  by  the  indexer,  by  an  editor,  or  by  the 
searcher;  thesauri  or  authority  lists  might  be  used  in  the  process. 

A.  Vocabulary  Control  During  Indexing 

Control  of  the  vocabulary  during  Indexing  places  the  burden 
of  use  of  the  authority  (list  or  thesaurus)  upon  the  indexer.  In  short, 
the  Indexer  Is  expected  to  analyze  the  document  at  hand,  to  develop  a 
tentative  list  of  retrieval  terms  therefor,  and  then  to  refer  to  the 
authority  for  the  following  purposes: 
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(1)  To  determine  If  each  tentatively-chosen  term  Is  la  the 
vocabulary;  If  so,  to  proceed  to  step  (2)  below;  If 
not,  to  take  the  appropriate  one  of  the  three  actions 
listed  Immediately  below: 

—  choose  an  accepted  synonym  (or  sufficiently  near¬ 
synonym)  for  the  tentatively-chosen  term 

—  choose  an  accepted  spelling  (or  word  form)  for 
the  tentatively-chosen  term 

--  decide  whether  the  tentatively-chosen  term  justifies 
being  noted  for  consideration  as  an  addition  to 
the  vocabulary;  If  not,  delete  the  term  from  the 
tracing  for  the  document. 

(2)  To  examine  the  specified  generic  relationships  with 
other  terms  In  the  vocabulary  and: 

--  to  add  to  the  tracing  generlcal ly-hlgher  terms  If 
the  chosen  term  Is  of  sufficient  Importance  to  the 
Information  Involved  to  justify  such  action,  arid 

--  to  add  to  the  tracing  appropriate  generl cal ly- lower 
terms  If  such  specificity  Is  appropriate,  con¬ 
sidering  the  specificity  of  the  Information  Involved. 

(3)  To  replace  the  tentatively-chosen  term  with  two  or  more 
accepted  terms. 
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(4)  To  add  to  the  tracing  appropriate  additional  terms 
chosen  from  the  list  of  terms  "seen  also"  from  the 
chosen  term. 

Vocabulary  control  at  this  point  Is  obviously  costly  In 
Indexer  time,  yet  It  Insures  the  best  possible  Indexing.  It  does, 
however,  tend  to  develop  such  "deep"  Indexing  that  the  use  of  syntacti¬ 
cal  controls  (e.g.,  role  Indicators  and  links)  may  be  required  to 
prevent  excessive  noise  during  retrieval;  the  use  of  syntactical  con¬ 
trols  adds  further  to  the  cost  of  Indexing.  In  spite  of  this,  If  the 
ratio  of  accessions  to  Inquiries  is  low,  and  If  the  need  for  speed  and 
ease  of  answering  Inquiries  Is  great,  vocabulary  control  during  In¬ 
dexing  may  well  prove  to  be  most  economical  of  time  and  money. 

B.  Vocabulary  Control  During  Indexing  and  Posting 

This  method  places  a  lesser  burden  upon  the  Indexer.  In 
short,  the  Indexer  might  only  exercise  functions  (l),  (3)  and  (4)  as 
described  in  the  above  paragraph.  Function  (2),  the  addition  of  generic 
relationships,  then  is  automat  1  cal lv  performed  (either  clerically  or  by 
machine)  at  the  time  the  Individual  postings  are  made  to  the  index. 
However, only  genericai ly-hlgher  relationships  can  be  added  automatically 

(the  proper  addition  of  generically-lower  relationships  being  Impossible 

! 

to  perform  at  this  step)  and  such  genericai ly-hlgher  relationships  must 
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be  added  bl indly, I rrespective  of  the  Importance  to  a  given  document 
of  the  term  upon  which  they  are  based.  The  advantage  of  this  method 
Is  that  It  does  save  a  certain  amount  of  costly  Indexer  effort. 

C.  Vocabulary  Control  During  Posting  Only 

This  method  relies  upon  competent  nonprofessional  personnel 
or  upon  sophisticated  machine  programs  (or  upon  both)  to  carry  out  the 
four  functions  described  in  paragraph  (A)  above.  The  function  of 
"normalizing"  tentatively-chosen  terms  Into  accepted  terms  may  be  quite 
well  performed  at  this  point.  The  function  of  the  addition  of  generic 
relationships  is  performed  just  as  in  paragraph  (B)  above.  However, 
the  function  of  replacing  the  tentatively-chosen  terms  with  two  or 
more  accepted  terms  and  of  the  function  of  adding  terms  implied  by  the 
Information  in  a  given  document  Is  difficult ,  if  not  Impossible,  to  per¬ 
form  at  this  point. 

The  advantage  of  this  method  is  that  It  saves  a  large  amount 
of  costly  Indexer  effort. 

0  Vocabulary  Control  During  Indexing  and  Inquiry 

This  method  requires  that  the  indexer  perform  only  function 
(I),  "normalizing"  tentatively-chosen  terms  Into  accepted  terms,  and 
relies  upon  inquiry-processing  operations  to  compensate  for  the  non¬ 
performance  at  an  earlier  tioa  af  functions  (2),  (3)  and  (4).  That  Is, 
rather  than  phrasing  inquiries  as  simple  logical  Intersections  of  terms 
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(e.g.,  all  Information  on  A  j^d  B  sjjd  C,  etc.) ,  the  phrasing  will  taka 
the  form  of  intersections  of  term  unions  (e.g.,  all  Information  on  A 
and/or  Z  and/or  ¥,  etc.  intersected  with  —  |M  —  Information  on 
B  and/or  X  and/or  W,  etc.) .  The  terms  chosen  to  constitute  these 
unions  are  based  upon  the  initial  terms  of  the  inquiry  and  are  developed 
from  the  generic  relationships  (particularly  lower  generic  relationships) 
and  "see  also"  relationships  exhibited  in  the  authority  for  the  initial 
terms  of  the  inquiry. 

This  method  is  practicable  only  when  suitable  retrieval  machines 

are  available;  without  such  machines  the  inquiry  becomes  excessively 

laborious  and  complex.  This  fact  explains  why  operators  of  mechanized 

systems  usually  report  that  their  inquiries  involve  unions  of  terms 

whereas  numerous  operators  of  manual  retrieval  systems  have  reported 

that  their  inquiries  are  usually  expressed  merely  as  intersections  of 

4 

terms. 

The  advantages  of  this  method  are  that  the  indexer  effort  is 
reduced  markedly  and  that  choice  of  additional  terms  during  retrieval 
is  based  upon  a  "real-world"  situation  (as  defined  by  one  inquiry) 
versus  choice  of  additional  terms  during  indexing  having  to  be  based 
upon  hypothetical  situations  (as  implied  by  all  possible  inquiries). 

Furthermore,  the  actual  bulk  of  the  index  is  decreased.  On  the 
other  hand,  the  time  and  cost  of  processing  inquiries  Is  increased. 
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E.  Vocabulary  Control  Purl ng.Pqatj.nfl „_Md_Bflt]laal 

This  method  Is  essentially  the  same  as  that  described  In 
the  preceding  paragraph  except  that  function  (1)  *-  "normal  I  nation"  — 
Is  performed  by  competent  nonprofesstonal  personnel  rather  than  by  the 
Indexer,  It  reduces  somewhat  the  Indexing  work  load,  but  all  other 
advantages  and  disadvantages  remain  as  described  above. 

Given  the  possession  of  suitable  retrieval  machines,  and 
assuming  the  usual  (high)  ratio  of  accessions  to  Inquiries,  It  would 
seem  that  this  method  of  vocabulary  control  offers  the  most  advanta¬ 
geous  choice  for  most  retrieval  systems. 

An  editor  or  anyone  else  concerned  In  the  control  and  Im¬ 
provement  of  a  vocabulary  must  rely  on  "feedback".  If  the  user  or 
searcher  reports  effectiveness  of  terms  or  coordinations  of  terms,  the 
vocabulary  can  be  controlled  accordingly. 

5.4  MODIFYING  THE  VOCABULARY 

It  Is  generally  recognized  that  system  vocabularies  cannot 
remain  static  but  must  change  to  accommodate  the  changing  technology 
reported  In  new  accessions  to  collections.  Basically,  of  course,  there 
exist  two  forms  of  modification:  addition  of  new  terms  and  deletion 
of  terms  found  to  be  of  little  value.  However,  the  means  of  achieving 
these  two  forms  of  modification  vary. 

Under  any  circumstances,  a  route  must  be  provided  for  adding  new 
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terms  to  the  vocabulary.  For  example,  one  can  well  Imagine  the  diffi¬ 
culties  and  unnecessary  work  which  would  be  created  had  the  now-heavlly- 
used  acronym  "radar"  been  excluded  Initially  from  the  ASTIA  vocabulary 
with  the  Insistence  that  all  documents  dealing  with  radar  be  posted 
Instead  on  "radio",  "detection"  and  "ranging". 

One  (and  a  principal)  means  of  addition  to  a  vocabulary  was 
noted  above:  l.e.,  the  "flagging"  of  newly-encountered  terminology 
during  Indexing  (or  posting)  for  consideration  for  addition  to  the 
vocabulary.  Ideally,  such  new  terminology  should  not  be  accorded  status 
as  retrieval  terms  until  at  least  a  moderate  degree  of  usefulness  has 
been  proven;  hence,  accurate  records  of  proposed  new  terminology  and 
of  Its  frequency  of  use  must  be  kept.  Further,  If  the  proposed  new 
terms  are  held  aside  in  a  subsidiary  Index  until  their  usefulness  has 
been  proven,  It  will  be  easy  to  transfer  them  (together  with  their 
postings  of  document  numbers)  to  the  main  Index  when  their  utility  has 
been  proven  or  to  merely  delete  them  (with  their  postings)  from  the 
subsidiary  Index  when  their  utility  has  been  conclusively  denied. 

ASTIA  follows  these  techniques  to  a  large  degree  by  maintaining  its 
"Identifier  Listing".'* 

A  second  means  of  adding  to  vocabularies  Is  that  of  pre- 
coordlnatlon  of  existing  terms.  When  It  Is  found  that  two  terms  may 
be  combined  to  create  a  third  term,  the  two  original  terms  are  retained. 
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Deletion  of  vocabulary  terms  may  be  merely  that,  the  action 
being  based  upon  low  usage  during  Indexing  and/or  retrieval.  More  use¬ 
fully,  however,  terms  should  be  deleted  by  transferring  their  postings 
to  either  a  generlcal ly-hlgher  term  (or  terms)  or  to  a  sufficiently 
near-synonym.  Such  deletions  should  have  the  term  replaced  by  appro¬ 
priate  "see"  references. 

Whatever  the  form  taken  by  vocabulary  modification,  the 
authority  for  the  vocabulary  must  reflect  the  change.  New  terms  must 
have  exhibited  their  generic  and  other  related  references  —  and  the 
pre-existing  terms  must  similarly  be  referred  appropriately  to  the  new 
terms.  When  terms  are  deleted,  all  references  to  them  in  the  authority 
(except  an  appropriate  "seen  from"  reference)  must  also  be  deleted. 

Little  has  been  reported  In  the  literature  on  formal  modi¬ 
fication  of  vocabularies.  Perhaps  this  is  because  relatively  few 
vocabularies  are  yet  truly  well  controlled.  ASTIA  Is  now  preparing 

the  second  edition  of  Its  Thesaurus,  and  this  operation  Is  being  well 

6  7  B 

documented  in  a  continuing  series  of  publications.  ’ 

ASTIA  has  stated  the  following. 

"A  major  objective  in  revising  the  Thesaurus. of 
ASTIA  Descriptors  Is  to  provide  an  improved  ASTIA 
Indexing  authority  in  a  form  most  useful  (l)  to 
assist  analysts  in  making  consistent  and  sufficiently 
complete  assignment  of  descriptors  to  accessioned 
technical  information  and  (2)  to  assist  bibliogra¬ 
phers  in  making  a  corresponding  consistent  use  of 


the  descriptors  during  the  formulation  of  Inquiries 
for  mechanized  retrieval. 

"A  second  major  objective  In  revising  the  Thesaurus 
Is  to  create  a  device  which  will  be  as  useful  as 
possible  to  reference  personnel  in  organizations 
other  than  ASTI  A.  In  this  connection,  AST  I A  Is 
anxious  during  revision  of  the  Thesaurus  to  have  the 
cooperation  and  active  participation  of  all  Indivi¬ 
duals  and  organizations  who  can  assist  In  making  the 
Thesaurus  more  useful  both  to  themselves  and  to  ASTI  A, "9 

An  Important  factor  In  AST  I A 1 s  request  for  cooperation  was 
to  Inject  both  user  orientation  and  user  feedback  Into  the  effort. 

Although  the  Initial  publication  has  been  continually  re¬ 
viewed  since  It  was  Issued,  revisions  were  kept  to  a  minimum  In  order 
to  Insure  that  the  statistics  obtained  during  its  use  were  meaningful. 
(These  statistics  are  currently  being  employed  to  determine  the  need 
for  coalescing  closely  related  descriptors  or  reducing  the  number  of 
broad  descriptors.) 

The  current  revision  program,  initiated  formally  In  mld- 
1961  and  continuing  to  date,  represents  a  large  cooperative  effort, 
and  one  of  considerable  magnitude.  The  new  edition  of  the  Thesaurus 
Is  scheduled  for  distribution  In  the  Autumn  of  1962. 
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5*5  Compatibility 

No  report  would  be  complete  without  some  reference  to  the  many 
efforts  involved  in  trying  to  achieve  compatibility  among  various 
information  systems.  Several  major  endeavors  In  this  area  are, 
accordingly,  briefly  described. 

An  Ad  Hoc  Interagency  Study  Group  on  Language  Compatibility  In 
Mechanized  Storage  and  Retrieval  Systems  was  organized  In  October 
1961.  Looking  forward  to  the  day  when  many  agencies  will  have  mechanized 
information  systems,  this  group  is  directing  Its  attention  to  total 
systems  compatibility,  rather  than  compatibility  of  thesauri. 

In  a  1950  paper,10  Mortimer  Taube  pointed  out  that  the  long-time 
effort  to  secure  uniformity  of  subject  cataloging  and  subject  control  on 
an  international  and  national  level  had  broken  down.  Even  with  reference 
to  our  three  national  I ibrarles,  the  Library  of  Congress,  the  National 
Medical  Library,  and  the  Department  of  Agriculture  Library,  different 
authority  lists,  different  methods  of  indexing,  and  different  methods 
of  classification  are  employed,  reflecting  differences  in  the  collection 
and  types  of  service.  It  was  suggested  in  that  paper  that  the  way 
towards  compatibility  was  in  the  direction  of  free  indexing  and 
unstructured  vocabularies.  The  initial  work  on  coordinate  indexing  was 
essentially  a  practical  demonstration  of  this  view.  The  authority 
lists  used  by  the  Air  Force  and  Navy  sections  of  ASTIA  were  structured 
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differently  end,  hence,  Incompatible.  One  of  the  specific  tasks 
carried  out  by  Documentation  Incorporated  and  reported  to  ASTIA  was 
the  construction  of  a  single,  minimally  structured  list  from  the  two 
existing  lists. 

In  the  current  revision  of  the  ASTIA  Thesaurus,  compatibility  of 
the  vocabulary  was  given  prime  consideration: 

"Thus  the  vocabulary  should  be  as  compatible  as 
possible  with  other  similarly  used  vocabularies  — 
and  the  Thesaurus,  as  the  principal  means  for 
achieving  such  compatibility,  should  make  It  possible 
for  other  organizations  to  'translate'  their  vocabu¬ 
laries  to  or  from  that  of  ASTIA  —  and  for  ASTIA  to 
do  the  same  with  other  vocabularies.  In  this  respect, 
the  assistance  of  organizations  other  than  ASTIA  will 
prove  Invaluable." 

It  has  been  pointed  out,  however,  that  the  complexity  of  structure 

as  reflected  In  a  thesaurus  does  not  contribute  to  compatibility  of 

Indexing  systems.  Undoubtedly  on  the  basis  of  his  own  experience  with 

thesaurus  construction,  Paul  Klingblel  of  ASTIA  has  said,  for  example: 

"But  the  documental  I st  Is  not  at  al  l  concerned 
with  the  possible  meanings  a  term  may  have.  He 
Is  concerned  only  with  that  variety  of  meanings 
which  occur  In  hjj,  collection.  Moreover,  he  is 
charged  with  the  responsibility  of  storing  and 
retrieving  that  segment  of  man's  knowledge  represented 
by  his  library. (italics  Kllngbiel's) 

That  Is  not  to  say  that  a  thesaurus  or  any  similar  highly 
structured  vocabulary  may  not  be  of  great  value  as  a  description  of  the 
vocabulary  structure  of  a  special  collection  and,  hence,  as  an  important 
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tool  for  the  searcher,  but  it  does  say  that  there  can  be  no  universal 
thesaurus  which  has  prescriptive  significance  for  indexers  working 
in  different  collections  in  different  Information  centers. 

It  is  understood  that  the  Engineers  Joint  Council  Is  becoming 

Interested  In  a  plan  of  abstracting  and  indexing  journal  articles  at 

the  time  of  publication  to  render  them  more  readily  and  promptly 

suitable  for  effective  storage  and  retrieval  by  the  various  systems 

employed  at  point  of  use.  This  system  has  been  instituted  by  the 

American  Institute  of  Chemical  Engineers,  and  their  three  journals 

now  carry  Index  terms  and  abstracts  for  the  separate  articles. 

The  program  has  been  well  accepted  by  the  A.I.Ch.E.  membership  and  has 

been  adopted  as  a  permanent  part  of  their  literature  service.  The 

Chemical  Engineering  Thesaurus  was  a  major  tool  in  the  implementation 
H  14 

of  this  program. 
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6.  ROLES  AND  LINKS 

6. 1  Description  and  Definition 

6.2  Operating  Experience  with  Roles  and  Links 
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6.  ROUS  AND  LINKS 

6.1  M&liPll 2EL_and  Dcf Inftlpn 

Although  they  are  called  by  several  names  and 

Implemented  In  many  forms  (which  will  be  described  In  the  second  part 

of  this  section),  roles  and  links  can  be  defined  quite  simply: 

Roles  narrow  the  definition  of  the  terms  by  designating 
the  role  of  a  word  In  context;  they  are  generally,  but 
not  necessarily,  symbols  appended  to  terms  or  term 
numbers . 

Links  are  grouping  devices;  they  are  generally,  but  not 
necessarily,  symbols  appended  to  Item  numbers. 

Links  show  that  terms  are  related.  Roles  In  themselves  do  not 
show  how  terms  are  related,  because  the  role  affixed  to  a  term  only 
describes  the  use  of  that  term  In  context;  It  Is  only  upon  coordination 
of  terms  which  have  roles  that  a  relationship  among  terms  Is  established. 

One  of  the  most  perplexing  problems  encountered  In  the  study  of 
roles  and  links  is  In  nomenclature.  Roles  are  variously  called  role 
Indicators,  modifiers,  and  modulants;  occasionally,  scope  notes  are 
called  roles  or  role  indicators.  Terms  with  roles  attached  have  been 
called  "structerms". 

Links  are  variously  called  interfixes,  link  letters,  punctuation, 
and  association. 

Scope  notes  are  erroneously  called  roles,  because  although  scope 
notes  do  "narrow  the  definition  of  terms"  (as  specified  In  the 
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definition  of  roles)  they  do  not  "designate  the  role  of  a  word  In 
context".  Scope  notes  are  used  generally  only  to  differentiate 
homographs. 

The  nomenclature  confusion  has  arisen  for  one  main  reason:  when 
one  modifies  a  concept  even  ever  so  slightly,  he  Indicates  this  fact  by 
giving  It  a  new  name.  Sometimes  it  Is  only  a  modification  In  scope; 
sometimes  It  Is  a  modification  In  form, 

For  example  Western  Reserve  University's  indexers  indicate  group¬ 
ing  of  terms  into  sentences  or  phrases  by  dots.  They  call  this 
"punctuation"  rather  than  links.  (Their  links  have  taken  this  form 
because  they  use  a  term-on-ltem  system  rather  than  an  Inverted  system. 

For  further  details,  see  Section  3.k>)  Because  it  Is  possible  to 
symbolize  a  link  either  by  a  letter  or  a  number,  one  group  at  du  Pont 
has  called  their  links  "link  letters"- 

There  is  at  least  one  person  who  uses  links  and  does  not  call 
them  links  —  nor  any  of  the  other  terms  mentioned  above.  In  a  paper 
presented  at  the  Fourth  Institute  of  Information  Storage  and  Retrieval, 
at  The  American  University,  William  B,  Kehl  described  work  In  storage 
and  retrieval  of  legal  material  at  the  University  of  Pittsburgh,1  in 
this  system,  as  in  WRU's,  the  index  is  a  term-on-ltem  file;  the  text  of 
the  document,  Including  punctuation,  is  entered  intact  on  computer  tapes. 
Prof.  Kehl  said  that  the  computer  program  was  such  that  it  could  accept 
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the  instruction  "retrieve  this  document  If  and  only  If  this  word  Is 
within  five  words  (or  some  other  number  of  words)  of  that  word".  An 
example  he  gave  included  the  terms  "unable"  and  "pay"  or  "payment". 
This  reduces  (but  does  not  completely  eliminate  the  possibility  of 
retrieving  documents  which  In  fact  have  these  two  terms  but  not  the 
expression  "unable  to  pay".  The  computer  can  also  accept  the 
instruction,  "retrieve  this  document  if  and  only  If  this  word  and 
that  word  are  in  the  same  sentence".  These  Instructions  are  grouping 
devices,  hence,  links. 

Prof.  Kehl's  system  also  uses  "roles",  although  here  again,  he 
did  not  so  call  them.  The  computer  can  accept  the  instruction, 
"retrieve  this  document  If  and  only  If  this  word  precedes  that  word". 
This  will  increase  (though  not  guarantee)  the  chances  of  retrieving, 
e.g.,  "water  for  cooling"  rather  than  "cooling  of  water".  The 
example  he  gave  was  "district  school"  versus  "school  district". 

Even  descriptive  catalogers  use  links  when  they  analyze  a 
document  In  two  or  more  separate  parts,  numbering  the  report  2051a, 
2051b,  etc. 

In  the  final  analysis,  it  is  possible  to  say  that,  except  for 
their  mode  of  implementation,  roles,  role  indicators,  modifiers,  and 
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modulants  are  synonomous;  and  that  links,  Interfixes,  link  letters, 

punctuation,  and  association  are  synonomous. 

Mortimer  Taube  of  Documentation  Incorporated  has  recently 

2 

published  a  significant  paper  on  roles  and  links.  In  this  paper 
he  shows  that  the  use  of  links  as  syntactical  devices  may  lead 
to  loss  of  information. 


6.2  Operating  Experience  with  Roles  and  links 

Very  little  operating  experience  with  roles  and  links  has 
been  reported.  Certain  opinions  have  been  offered,  informally,  as 
to  the  cost  and/or  efficiency  of  these  devices,  but  they  have  not  been 
completely  substantiated. 

It  is  generally  admitted  that  the  addition  of  roles  or  links 
will  increase  input  costs;  by  how  much,  however,  is  not  known. 
Estimates  range  from  5  percent  to  50  percent. 

Whether  or  not  roles  and  links  truly  improve  system  efficiency 
has  not  been  proven.  Even  those  who  insist  that  the  system  is  more 
effective  do  not  indicate  whether  or  not  the  peecent  of  increase  in 
efficiency  is  equal  to  or  in  fair  proportion  to  the  increase  in  cost. 
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In  one  informal  disclosure  to  the  Documentation  Incorporated  study 
group  It  was  said  that  the  proportion  of  false  drops  retrieved  without 
roles  and  links  to  the  false  drops  retrieved  with  these  devices  In 
the  system  was  4.5  to  1 .  It  was  admitted,  however,  that  the  test  was 
neither  exhaustive  nor  conclusive;  no  cost  comparison  was  made. 

According  to  available  information  and  material  submitted  to 
NSF  for  Nonconvent iona I  Technical  Information  Systems  in  Current  Use. 
No, 3.  which  is  in  preparation,  the  following  organizations  are  using 
or  about  to  use  roles,  but  not  necessarily  links. 


U.S.  Patent  Office,  Research  and  Development  Division 
Central  Intel  1 igence  Agency 

Western  Reserve  University,  Center  for  Documentation 
and  Communication  Research 
National  Lead  Company,  Titanium  Division 
Armour  and  Company,  Patent  Law  Department 
Philip  Morris,  Inc.,  Research  Center 
Unde  Company,  Research  Laboratory 
Eastman  Kodak  Company,  Research  Laboratories 
c.  I .  du  Pont  de  Nemours  6-  Company: 

Engineering  Information  Center 
Polychemicals  Information  Center 
Monsanto  Chemical  Company 
Jonker  Business  Machines,  Inc. 

General  Electric  Co.,  Defense  Systems  Department 

WRU,^  du  Pont,^’^'^’^  and  the  Patent  Office®  have  published  on 
their  application  of  roles  and  links.  These  papers  describe  implemen¬ 
tation  more  than  operating  experience;  that  is  they  do  not  give 
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Information  on  effectiveness,  cost,  etc, 

It  should  he  noted  that  roles  must  be  tailored  to  the  system.  For 
example,  the  A.I.Ch.E.  roles  would  be  of  no  use  In  legal  material;  In 
fact,  several  chemical  systems  use  different  roles.  Examples  of  the 

A.I.Ch.E.  roles  are  given  below. 

A.  Input  to  a  chemical  reaction,  physical  production, 
operation,  electrical,  or  mathematical  system. 

C.  Waste,  contaminant,  Impurity. 

D,  Agent 

H.  Active  concept,  subject  of  study. 

Roles  in  a  legal  system,  on  the  other  hand,  may  be  as  follows: 

A.  Plaintiff 

B.  Defendant 

C.  Witness 

D.  Statute  cited  —  applicable 

E.  Statute  cited  —  not  applicable. 

Although  roles  and  links  generally  are  of  the  form  described  In 
the  preceding  section,  i.e.,  they  are  letters  or  numbers  attached  to 
terms  or  Item  numbers,  they  do  appear  In  other  forms. 

An  example  of  a  different  configuration  is  shown  below.  This  Is 
selected  from  a  Western  Reserve  University  coding  sheet.  In  a  non- 
Inverted  system  like  WRU's.,  links  especially  must  take  a  different 
form. 


1. 

..KEJ, 

2. 

ROD 

3. 

.  KUJ, 

4. 

ALLOY 

5. 

.KUJ, 

6. 

AL 

7. 

..KAM, 

8. 

ANNEALING 
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The  double  dots  Indicate  the  grouping  of  the  terms 
"rod",  "alloy",  and  "aluminum",  The  comma  Indicates  that 
KEJ  Is  one  unit  of  Information  and  that  rod  Is  another 
which  Is  associated  with  It.  The  single  dot  Indicates 
that  KUJ  and  alloy  (which  are  associated  with  each  other 
by  the  comma)  are  associated  with  the  other  terms  between 
the  double  dots. 

KEJ  Is  a  role  Indicator.  It  shows  that  the  word 
"rod"  Is  the  name  of  a  material  which  Is  acted  on  by  a 
process.  When  the  other  role  Indicators  are  translated, 
the  Information  In  the  sample  Is  as  follows:  a  rod  Is 
shown  as  being  composed  of  an  alloy  whose  major  constitu¬ 
ent  Is  aluminum  and  this  aluminum  alloy  rod  Is  being 
subjected  to  the  process  of  annealing. 

Now  that  several  groups  have  Incorporated  roles  and/or  links  In 
their  systems, (lt  Is  hoped  that  thorough  studies  on  effectiveness  and 
cost  will  be  made.  Prlma  facie. It  seems  that  In  systems  of  deep 
Indexing  and  limited  vocabulary,  such  devices  may  be  useful  and  worth 
the  cost. 
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j  7.  EVALUATING  COORDINATE  IWPHUWJ»IBg 

There  If  a  marked  absence  of  rigorous  tests  and  controlled 
experiments  in  the  theory  and  practice  of  coordinate  indexing  today. 
Various  attempts  have  been  made  to  evaluate  systems  and  applications  of 
theories  of  coordinate  Indexing;  these  have  at  best  met  with  limited 
success  and  at  worst  have  been  prejudiced  and  Invalid. 

This  section  of  the  report  Is  an  attempt  to  discover  the  reasons 

. | 

for  the  Inconcluslveriess  of  evaluative  studies,  to  Indicate  the 
problems  Involved,  and  to  report  and  suggest  methods  of  analysis. 

By  ''retrieval  system"  or  "Information  system"  here  is  meant  any 
means  of  providing  access  to  a  body  of  Information.  Thus,  even  a 
book's  Index  is  a  system. 

7.1  Factors  In  Evaluation 

The  evaluation  problem  has  several  levels.  In  the  first 
place,  coordinate  Indexing  systems  may  be  compared  among  themselves  or 
with  other  indexing  systems,  such  as  classification  systems.  Secondly, 
the  system  may  be  evaluated  within  itself  in  order  to  determine  Its 
effectiveness.  This  is  an  opinion  of  the  users,  who  differ  among 
themselves.  It  is  the  opinion  of  the  designer,  who  might  say  that  his 
system  Is  the  best  available  given  the  present  state  of  the  art.  It  Is 
the  opinion  of  the  indexers,  who  mtght  say  they  are  Indexing  as  well  as 
possible.  It  Is  the  opinion  of  the  administrator,  who  might  say  that 


It  ti  the  best  system  for  the  price. 


Suppose,  however,  that  the  Ineffectiveness  of  the  system  is 
admitted.  (Even  If  It  Is  not,  there  are  very  few  people  who  would  say 
there  is  no  room  for  Improvement  in  their  systems.) 

The  faults  In  an  Ineffective  retrieval  system  may  lie  In  any,  all, 
or  any  combination  of  the  following: 

(1)  Acquisition 

(2)  Indexing 

(a)  point  of  view 

(b)  depth  of  Indexing 

(3)  Index 

(a)  term  arrangement 

(b)  format 

(c)  s i ze 

(4)  Constraints 

(a)  categorization 

(b)  p re- coordination 

(c)  "see"  and  "see  also" 

(d)  role  indicators 

(e)  links 

(f)  scope  notes 

(g)  codes 

(5)  Search  procedure 

(a)  logical  operations 

(b)  equipment  capabilities 

(6)  User  education 

(a)  knowledge  of  system  capabilities 

(b)  question  formulation 

Because  this  report  Is  on  the  state  of  the  art  of  coordinate 
Indexing  and  not  on  the  larger  subject,  Information  storage  and 


84 


retrieval,  this  discussion  of  evaluation  will  be  restricted  to  studies 

of  Indexing  and  of  Indexes  with  and  without  constraints. 

It  Is  understood,  of  course,  that  Indexes  and  indexing  are 

dependent  variables  In  a  total  system,  and  that  a  complete  evaluation 

1,2. 

must  take  the  total  system  Into  account. 

It  should  be  noted  that  the  problem  of  evaluation  Is  threefold: 

(1)  determining  whether  or  not  a  system  Is  effective  In  itself  or  as  compared 
to  another,  (2)  determining  where  deficiencies  lie,  and  (3)  determining 
the  causes.  Furthermore,  the  evaluation  Itself  must  be  conducted  in  a 
formal,  logical,  and  valid  way. 

7.2  The  Difference  between  Evaluating  Indexing  and  Evaluating  the  Index 
The  difference  between  Indexing  and  the  Index  must  be  made  clear. 

This  Is  Important  because  In  any  system,  the  Indexing  may  be  perfectly 
good,  but  the  Index  so  arranged  that  It  Is  unwleldly  or  overcomplicated. 

The  reverse  situation  can  also  exist:  the  Indexing  might  really  be  poor, 
and  no  matter  how  perfectly  arranged  the  Index,  the  user  will  not 
retrieve  Information. 

Saying  that  the  index  is  poor,  then,  Implies  something  about  the 
arrangement  of  the  Index  (including  term  arrangement,  constraints, 
format ,and  size) . 

Saying  that  indexing  Is  poor  Implies  something  about  the 
Intelligence  of  the  Indexer,  that  is,  the  Indexer's  experience  In 
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Indexing,  his  knowledge  of  the  subject  (or  at  least  of  the  terminology 
of  the  subject),  and  his  capacity  for  being  educated  to  the  point  of 
view  of  the  system. 

Coordinate  indexing  In  Its  free  form  has  a  great  advantage 
over  other  kinds  of  indexing  In  that  the  indexer  is  not  required 
to  "fit"  the  terms  of  the  document  Into  a  greater  system,  as  he  first 
In  a  structured  system.  Nor  Is  he  required  to  formulate  the 
concepts  of  the  document  into  concise  phrases,  as  he  must  for  an 
alphabetic  index  such  as  a  book  index. 

Nevertheless,  several  demands  are  made  on  the  Indexer's 
Intelligence.  He  must  have  an  appreciation  for  the  nature  and  use  of 
the  Index;  he  must  know  the  subject  matter  well  enough  to  be  able  to 
choose  the  key  words  which  best  describe  the  document,  and  he  must 
also  know  how  that  document  or  the  terms  which  describe  It  fit  Into 
the  total  system. 

Furthermore,  with  the  addition  of  constraints,  such  as  roles  and 
links  or  generic  posting,  the  Indexer  must  enlarge  his  effort  In  order 
to  recognize  and  name  functions  or  relationships  among  terms. 

In  any  attempt  to  evaluate  Indexing,  these  Intellectual  operations 
of  the  Indexer  must  somehow  be  taken  Into  consideration.  It  Is  for 
this  reason  that  there  has  been  little  or  no  success  In  evolving 
criteria  by  which  indexing  quality  can  be  measured.  There  have  been 
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studies  made  of  Indexing,  however,  and  certain  useful  Information  has 
been  derived  from  the  work.  These  studies  are  described  In  greater 
detail  below. 

7.3  Indexing  Studies 

Nearly  everyone  who  has  ever  been  Involved  In  Indexing  has 
conducted  the  following  very  simple  experiment:  several  people  (of 
related  or  different  backgrounds)  are  given  the  same  document;  no  limit 
Is  placed  on  time  or  on  the  number  of  terms  that  can  be  used. 

It  Is  obvious  that  there  are  many  variables  In  this  experiment 
and  this  partially  accounts  for  the  fact  that  results  differ  from 
experiment  to  experiment  and  that  the  results  from  any  one  experiment 
are  difficult  to  analyze. 

It  may  be  of  Interest  here  to  note  a  few  findings  of  such  an 
Informal  experiment  In  Uniterm  Indexing  In  Documentation  Incorporated's 
Man-Machine  Information  Center. 

(1)  Number  of  terms  Indexed  varied  from  10  -  45. 

(2)  Time  varied  from  5  “  25  minutes. 

(3)  The  number  of  terms  used  was  not  proportional  to 
..time  expended. 

(4)  The  number  of  terms  In  common  was  about  six.  (This 
was  an  average  figure;  In  comparison  of  some  sets  of 
two  results  only  one  or  two  terms  were  found  In 
common.) 

* 

(5)  Point  of  view  was  significant. 
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(a)  Indexers  with  different  technical  backgrounds 
Indexed  differently.  (For  example,  a  report 
on  pressure  effects  on  aircraft  pilots  at  high 
altitudes  would  be  Indexed  differently  by  a 
psychologist  and  by  a  physicist.  Both,  however, 
would  probably  choose  the  major  terms:  pressure, 
high  altitudes,  pilot,  aircraft.) 

(b)  Because  the  Indexers,  in  this  system,  also 
retrieved  and  used  the  Information  In  the 
store  for  providing  answers  to  M-MIC  clients, 
their  Indexing  was  Influenced  by  their 
knowledge  of  system  use.  (For  example,  one 
Indexer  who  was  aware  of  a  particular  client's 
wish  for  all  Information  on  thin  films  was 
Inclined  to  Index  any  related  Information  on 
the  subject,  even  though  the  general  value  was 
small .) 

(6)  Under  different  circumstances,  the  same  Indexer  might 
Index  the  same  document  differently.  That  Is,  even 
from  one  day  to  the  next,  although  a  certain  number 
of  terms  (about  seven)  might  be  the  same,  the  choice 
of  less  Important  terms  varied. 

Especially  from  results  (5)  and  (6)  the  importance  of  depth  of 
indexing  becomes  obvious.  If  the  system  were  restricted  to  four  or  five 
terms  the  differences  In  choice  of  terms  would  be  negligible.  This  Is 
generally  not  found  practical,  however,  because  system  users  require 


detailed  Information. 

For  example,  the  report  mentioned  above  on  effects  of  pressure 
on  aircraft  pilots  at  high  altitudes  might  be  pertinent  to  a  search  for 
Information  on  pressure  tests  made  with  a  particular  kind  of  device.  If 
that  device  had  not  been  Indexed,  the  report  would  not  be  retrieved  by 
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correlation  of  the  terms  In  the  question. 

There  are  two  possible  "fall-safe"  procedures:  (l)  to  formulate 
questions  in  the  most  general  terms  possible,  or  (2)  to  index  50  or  60  terms 
per  document.  Both  of  these  methods  result  in  retrieval  of  excessive  non- 
pertinent  Information.  (Information  that  Is  "nonpertinent"  Is  not 
necessarily  a  "false  drop".  This  will  be  discussed  further  In  a  later 
section.) 

The  M-MIC  personnel  frequently  found  that  overlndexlng  contri¬ 
buted  to  the  retrieval  of  excess  material.  For  example,  In  a  search  for 
Information  on  digital  differential  analyzers,  many  reports  were 
retrieved  which  mentioned  such  devices  only  in  passing. 

Because  of  these  informal  findings,  M-MIC  Indexers  were 
Instructed  to  use  only  10  -  12  terms  per  document  where  possible.  This 
was  not  a  formal  deduction  from  a  scientifically  correct  experiment,  but 
Its  application  was  found  to  produce  less  extraneous  Information.  It  was 
not  determined  whether  or  not  Information  was  being  lost,  but  It  was 
believed  that  when  the  Indexers  were  more  restricted  in  number  of  terms 
which  could  be  used,  they  were  more  likely  to  weigh  the  relative  merit 
of  that  term  in  the  context  of  the  document. 

In  reviewing  the  experiment  outlined  above,  several  fallings  are 
seen:  (1)  the  experiment  was  not  formalized,  (2)  it  was  not  a  "controlled" 
experiment,  (3)  results  were  not  carefully  analyzed,  and  (4)  the 
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(Intuitive)  Implementation  of  the  findings  was  not  subsequently  subjected 
to  rigorous  test. 

Study  of  the  reports  on  coordinate  Indexing  systems  has  disclosed 
no  fundamentally  better  experiments  In  Indexing*  nor  any  more  formal 
deductions  from  experiments.  Many  groups  have  reported  that  "it  was 
found"  that  ten  terms  best  indexed  an  article,  or  that  chemists  Indexed 
chemical  literature  better  than  others  did,  or  that  to  terms  that  an 
author  chose  It  was  necessary  to  add  others. 

It  Is  significant  that  the  converse  of  each  of  these  findings  Is 
reported  with  equal  confidence. 

Studies  in  Indexing  or  In  evaluating  Indexing  are  almost  entirely 
Intuitive.  Perhaps,  because  the  indexer's  Intellectual  operations  must 
be  taken  Into  consideration,  deductive  studies  are  not  possible. 

Described  below  are  four  possible  exceptions  to  one  or  both  of 
these  statements.  One  is  an  attempt  to  formulate  a  generalized  theory 
of  indexing;  the  second  and  third  are  attempts  to  rake  deductions  from 
actual  indexing  experience;  the  fourth  is  the  work  In  automatic  Indexing. 
Jonker's  Descriptive  Continuum 

Frederick  Jonker's  "The  Descriptive  Continuum  --  A  'Generalized' 
Theory  of  Indexing"3  describes  a  study  based  on  the  premise  that  "no  true 
understanding  of  existing  systems  and  problems  seems  possible,  unless  all 
systems  can  be  seen  in  the  light  of  more  general  common  concepts,  linking 
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all  these  systems  together  into  a  single  'closed'  system." 

The  generalized  theory  of  Indexing  postulated  in  this  article 

looks  upon  all  Indexing  systems  as  a  continuum,  the  descriptive 
continuum.  The  main  parameter  of  this  continuum  is  the  average 
length  of  the  "entries"  or  "headings"  used.  At  one  end  of  the  continuum 
or  "spectrum"  Is  keyword  Indexing;  subject  heading  Indexing  Is  somewhere 
In  the  middle,  while  hierarchic  classifications  are  at  the  other  extreme. 
The  average  length  of  the  headings  or  descriptive  terms  used  determines 

the  position  In  the  continuum. 

Throughout  the  continuum,  all  other  parameters  behave  as 

functions  of  the  average  term  length.  Some  of  these  parameters  are: 

—  Potential  depth  of  Indexing 
—  Permutabl I Ity  of  indexing  criteria 
—  Degree  of  hlerarchlal  definition  of  Indexing 
—  Potential  need  for  a  coordinating  mechanism 
—  Retrieval  noise 
--  Size  of  the  access  apparatus 
—  False  coordinations 

—  Capacity  for  handling  semantic  indeterminacy. 

The  theory  indicates  that  once  the  main  parameter,  average  term 
length,  Is  determined,  all  other  properties  of  the  Indexing  system  are 
fixed.  For  every  information  collection  there  Is  an  "optimum"  position 
In  the  continuum,  according  to  which  the  collection  should  be  organized. 
This  optimum  position  is  determined  by  the  diffuseness  of  the  Infor¬ 
mation  In  that  particular  field. 
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MacMillan  and  Welt  Indexing  Study 

"A  Study  of  Indexing  Procedures  In  a  Limited  Area  of  the  Medical 

4 

Sciences,"  by  Judith  T.  MacMillan  and  Isaac  D.  Welt,  reports  an 
attempt  in  making  deductions  from  actual  operating  experience. 

The  Cardiovascular  Literature  Project  from  which  this  study 
emerged  was  an  Indexing-abstracting  project ,  which  over  a  period,  of 
six  years  encompassed  31,000  articles. 

The  paper  points  up  the  polnt-of-view  problem  discussed  earlier. 

The  indexing  differences  among  chemists,  physiologists,  etc.,  were 
found  "more  obvious  thanthe  similarities."  Figures  are  given  on 
numbers  of  doubly- Indexed  papers  which  were  indexed  the  same  or 
synonymously,  which  were  duplicated  but  Incompletely ,  etc. 

Although  the  findings  of  this  report  are  more  often  inferences 
than  deductions,  this  work  does  represent  an  attempt  to  make  deductions 
from  actual  indexing  experience. 

Documentation  Incorporated  --  Rome  Air  Development  Center  Study 

A  study  Is  now  in  progress  at  Documentation  Incorporated,  supported 
by  the  Rome  Air  Development  Center.  This  is  a  large-scale,  statistical 
investigation  to  determine  indexing  consistency  among  indexers,  as  well 
as  any  one  indexer's  consistency,  and  to  measure  the  effects  of  certain 
learning  or  teaching  aids  and  tools  on  Indexer  efficiency. 

The  experiment  is  being  performed  on  chemical  patents  Indexed  by 
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experienced  indexers,  and  should  be  completed  by  November  1962. 

Automatic  Indexing 

It  is  curious  that  some  of  the  basic  studies  of  Indexing  have 
been  made  In  connection  with  automatic  Indexing. 

These  are,  however,  only  studies  or  experiments.  Even  these 
Investigators  have  not  found  means  of  truly  testing  or  evaluating  the 
effectiveness  of  their  indexing  because  in  nearly  all  experiments  with 
machine  indexing,  the  "control"  or  the  basis  of  comparison  Is  human 
Indexing.  The  machine  indexing  is  considered  "good"  If  it  compares 
favorably  with  human  indexing.  This  is  paradoxical,  because  no  one 
knows  when  human  indexing  is  "good". 

Maron^  has  said  that  if  one  is  willing  to  collect  enough  statis¬ 
tical  data  relating  words  and  categories,  and  if  one  is  prepared  to 
consider  more  and  more  of  the  relationships  that  exist  between  Individual 
words,  word  combinations,  word  type,  etc.,  and  categories,  one  can  index 
by  machine  with  increasing  accuracy.  This  statement  refers  to  indexing 
by  categories. 

Similar  statements  were  made  by  Cherenin^  in  an  early  paper  In 
discussing  the  problem  of  an  information  language. 

luhn's  experiments  in  automatic  Indexing'7  are  directed  to  what 
may  be  called  a  simpler  problem:  indexing  by  Uniterms*  Here,  as  above, 
statistical  data  are  Important.  Luhn  proposes  to  Index  by  counting 
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frequency  of  occurrence  of  "notion"  word*  In  the  document.  Notion 
words  are  opposed  to  "connective  tissue"  the  and's,  of's,  for's,  etc., 
In  a  sentence. 

Before  any  of  these  systems  can  be  proved  effective  it  must  be 
shown  that  the  point  of  view  of  the  system  can  be  taken  Into  account, 
that  is,  that  terms  can  be  selected  on  the  basis  of  the  design  and  use 
of  the  system. 

It  should  be  pointed  out  that  mechanization  is  not  a  solution  to 
indexing  problems.  That  automatic  indexing  will  in  certain  cases 
substitute  for  human  indexing  is  possible;  that  It  will  eliminate  the 
need  for  human  Indexing  (or  solve  any  of  the  associated  difficulties) 
is  unlikely. 

In  conclusion,  it  can  be  said  that  the  indexing  studies  described 
in  preceding  sections  have  been  scientifically  unsatisfactory;  i.e., 
they  have  been  intuitive,  informal,  and  without  rigid  controls.  This 
does  not  make  them  Invalid  nor  useless,  but  serves  to  underline  the  fact 
that  Indexing  is  a  thought  process  and  as  such  cannot  be  subject  to 
deductive  analysis.  The  product  of  Indexing,  i.e.,  the  indexer's 
tracing  sheet,  may  lend  Itself  to  deductive  or  comparative  analysis,  but 
the  study  of  indexing  ger  se  is  necessarily  intuitive. 

7.4  Index  and  Index  Constraint  Studies 

7.4.1  General  Studies 

Although  indexing  studies  have  been  few,  there  is  no  lack  of 
Index  and  index  constraint  studies.  They  range  in  scope  from  comparisons 
of  fcoordlnate  Indexing  and  classification  systems  to  measuring  the  value 
of  the  addition  of  particular  constraints  (e.g.,  roles)  to  an  index. 


The  constraints ,  as  mentioned  before,  are 

(a)  categorization 

(b)  pre-coordlnatlon 

(c)  "see"  and  "see  also" 

(d)  roles 

(e)  links 

(f)  scope  notes 

(g)  codes 

It  should  be  noted  that  although  many  of  the  tests  have  been 

conducted  with  at  least  an  attempt  to  be  scientifically  valid,  results 

(just  as  In  Indexing  studies)  are  not  always  universally  valid,  and 

perhaps  with  reason.  Trying  to  apply  someone  else's  experience  with 

roles,  for  example,  Is  rather  like  expecting  the  medicine  In  another 

man's  cabinet  to  cure  your  own  ills. 

At  least  two  major  studies  of  comparison  of  coordinate  Indexing 

with  classification  or  other  systems  should  be  noted  here.  These  are 

8 

Cleverdon's  As  lib  Cranfield  Research  Project  and  Schuller's  report  of 

9 

experience  with  UDC  and  Unlterms  .  Several  other  studies  have,  of  course, 

been  made.  Many  of  these  are  reported  in  ASTIA's  report  on  the 

10 

Conference  on  Multiple  Aspect  Searching  . 

The  Cleverdon  paper- reports  results  of  five  tests  of  retrieval 
efficiency  and  gives  an"ef f Iciency  percentage"  for  UDC,  alphabetical, 
faceted,  and  Uniterm  indexes. 

Alphabetical  Indexes  had  the  highest  percentage  In  Test  1, 
retrieval  efficiency  based  on  300  searches  by  project  staff.  UDC 
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scored  highest  In  Test  2,  searches  by  engineering  staff. 

In  Test  3,  the  Indexes  were  rated  by  Indexing  time.  With  16 
minutes  allowed,  alphabetical  Indexes  had  the  highest  efficiency 
percentage;  with  12  minutes,  UDC;  with  8  minutes,  alphabetical;  with 
4  minutes,  Uniterm;  with  2  minutes,  UDC. 

Test  4  compared  results  according  to  three  Indexers.  For  two 
of  the  Indexers,  alphabetical  Indexes  had  the  highest  retrieval 
efficiency;  for  the  third,  UDC. 

Test  5  compared  results  according  to  subject.  Uniterm  Indexes 
had  but  a  si Ightly  higher  percentage  of  retrieval  for  aerodynamic 
papers  (with  alphabetical  Indexes  close  behind);  for  "other  subjects 
the  alphabetical  Indexes  had  the  highest  percentage. 

Before  one  draws  too  many  conclusions  from  these  findings,  he 
should  be  reminded  that  the  tests  were  on  documents  dealing  with 
aeronautics  and  allied  subjects,  and  the  results  might  be  entirely 
different  in  other  areas. 

The  Schuller  study  Is  on  a  much  smaller  scale  than  Cleverdon's, 
One  test  result  (on  100  queries)  shows  the  following. 


Time 

Irrelevant 

Relevant 

(Min) 

Documents 

Documents 

UDC 

4042 

4908 

698 

Uni  terms 

3981 

528 

739 

with  482  found  by  both  systems. 
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The  Schuller  study  recommends,  based  on  these  and  other  tests, 
the  use  of  the  Uniterm  system  for  a  collection  of  technical  reports. 

This  Is  not  the  conclusion  one  might  reach  from  the  Cleverdon 
study,  and  serves  to  point  out  what  has  been  constantly  repeated:  that 
test  results,  even  when  correct,  are  not  universally  applicable. 

Studies  on  coordinate  Indexing  systems  with  or  without 
constraints  are  too  numerous  to  detail.  Reports  on  such  studies  are 
referenced  throughout  the  bibliography.  (Work  on  roles  and  links  is 
detailed  to  some  degree  in  Section  6  of  this  report.) 

In  the  following  three  sections,  however,  are  discussed  three 
particular  methods  of  evaluating  or  analyzing  indexes. 

7.4.2  False  Drops  as  Test  of  Index  Effectiveness 

It  would  seem  possible  to  devise  means  of  testing 
system  effectiveness  by  analyzing  false  drops.  The  question 
Immediately  arises:  What  Is  a  false  drop? 

As  Mr.  B.  K.  Dennis  suggested  (In  a  private  communication),  It  is 
very  difficult  to  define,  let  alone  analyze,  a  false  drop.  Whether  or 
not  the  material  answers  the  query  Is  subject  to  the  opinion  of  (l)  the 
man  who  Indexes,  (2)  the  searcher  who  formulates  the  question,  (3)  the 
technical  evaluator  who  screens  retrieved  material  before  forwarding  It 
to  the  user,  and  (4)  the  user  (one  individual  may  perform  several  or  all 
of  these  functions).  A  system,  he  pointed  out,  may  correctly  retrieve 
nonpertinent  Information. 
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It  may,  for  example,  retrieve  an  Item  correct  In  all  respects  except, 
say,  temperature  range  or  geographical  location.  Perhaps  the  user  did 
not  specify  these  criteria  In  his  question,  or  perhaps  such  terms  were 
never  entered  Into  the  Index. 

Another  possible  "correct"  nonpertinent  retrieval  was  suggested 
previously.  A  document  may  Indeed  mention  digital  differential 
analyzers  and  yet  not  provide  the  user  the  Information  he  wants. 

Still  another  kind  of  "correct"  false  drop  Is  encountered  In 
certain  systems  using  generic  levels.  Documents  discussing  cocker 
spaniels  may  be  retrieved  when  the  system  Is  queried  for  Information 
on  fox  terriers,  If  both  were  posted  on  "dogs",  and  the  user  Included 
that  generic  term  to  Insure  that  no  Items  on  fox  terriers  were  missed. 

Although  analysis  of  false  drops  seems  a  possible  method  of 
evaluating  system  performance,  there  have  been  no  reports  of  success  In 
this  method.  Nor  has  anyone  disclosed  a  means  of  analyzing  the  reasons 
for  nonpertinency  of  material  received. 

The  only  profitable  use  made  of  false  drops,  or  what  are  more 
properly  called  false  coordinations,  Is  In  the  checking  of  search 
procedures  or  equipment. 

7.4.3  jJ  1SL ilMllilSt 

Studies  In  frequency  of  term  usage  both  In  Index  and 
search,  and  studies  of  the  "association  factor",  are  under  way,  Find- 


98 


Ings  from  such  Investigations  are  potentially  useful  not  only  In 
evaluating  but  In  Improving  a  system. 

At  ASTIA,  three  computer-prepared  aids  are  under  study.  These 
are  printouts  showing  (1)  Descriptor  Frequency  Listing  by  Document 
Assignment  and  by  Bibliography  Usage.  An  example  of  such  a  printout 
is  shown  below. 


DESCRIPTOR 

ASSIGNMENT 

REQUEST 

Jet  Planes 

2216 

37 

Jet  Seaplanes 

22 

9 

Tungsten  Wire 

23 

0 

Water  Entry 

123 

2 

(This  and  following  examples 

are  taken  from 

a  memo  to  all  ASTIA 

divisions  dated  30th  October  I9&I,  and  signed  by  J.  Heston  Heald,  Chief, 

Document  Processing  Division. 

The  examples 

are  "sample  formats  of 

proposed  reference  tools".) 


Low  Frequency  Descriptor  Manual 

File. 

Descriptor 

AD  Numbers 

Alpha  Chambers 

204 

929 

Demerol 

209 

426 

First  Aid  Kits 

219 

222 

127 

912 

Hall  Damage 

219 

220 

427 

921 

99 

Thus,  where  a  descriptor  Is  used  Infrequently,  the  term  and  reference 
will  be  maintained  In  a  manual  file. 

(3)  List  of  Context  Descriptor  Sets 
Cameras  (138) 

Aerial  Cameras  (ll) 

Al  rborne  (II) 

Amplitude  Modulation  (15) 

Combustion  (1) 

Design  (48) 

Detection  (43) 

Instrumentation  (25) 

Photography  (14) 

Power  Supplies  (16) 

Training  (3) 

Transducers  (l) 

Underwater  Photo,  (l) 

Upper  Atmosphere  (2) 

Vibration  (2) 

This  printout  shows  all  terms  which  have  been  Indexed  with  the 
term  "camera"  and  how  often.  AST IA  has  not  yet  an  economically 
practical  (l.e.,  fast)  way  to  get  this  count. 

From  list  (l),  the  terms  which  are  never,  or  Infrequently  used, 
can  easily  be  found  and  eliminated  or  modified. 
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From  list  (2)  Infrequently  used  terms  which  ere  retained  In  the 
system  are  noted  and  removed  to  manual  files  so  a$  to  eliminate  extra 
search  time  on  the  computer. 

(The  reverse  of  (2)  could  be  useful  In  a  system  for  document 

ii 

Inventory  control.  That  Is,  a  printout  showing  number  of  times  a 
document  was  retrieved  In  a  search  would  Indicate  the  document's 
relevancy  to  a  system.) 

The  usefulness  of  a  tool  such  as  list  (3)  has  long  been 
recognized.  Documentation  Incorporated's  "EDIAC"  (Electronic  Display 
of  Indexing  Association  and  Content)  was  a  device  which  displayed  the 
logical  sum  of  all  other  words  used  In  Indexing  any  document  Indexed  by 
the  first  word  entered  In  the  device.  This,  as  the  ASTIA  printout, 
would  enable  a  searcher  to  select  an  additional  word  to  coordinate  with 
the  first  word.  In  the  EDIAC  method  the  searcher  would  know  that  at 
least  one  document  In  the  system  was  Indexed  by  both  terms;  In  the  ASTIA 
method,  he  would  also  know  how  many  documents  are  indexed  by  both  words 

A  further  refinement  of  this  method  has  been  proposed  by  Stiles." 
The  first  step  in  the  procedure  Is  to  develop  a  list  of  terms  arranged 
according  to  their  degree  of  association  with  a  given  term.  ASTIA's 
method  does  not  give  this,  but  only  frequency  of  association.  In  the 
paper  referenced,  the  formula  used  for  the  association  factor  (to 
measure  degree)  and  further  details  of  the  procedure  are  given.  The 
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method  is  still  under  trial,  and  no  conclusive  results  have  been 
achieved;  the  economic  feasibility  of  such  an  approach  has  not  yet  been 
shown.  The  goal,  however,  of  such  a  method  Is  indeed  a  worthy  ones  to 
find  documents  related  to  a  request  even  though  they  have  not  been 
indexed  by  the  exact  terms  of  the  request,  and  to  present  the  documents 
in  the  order  of  their  relevance  to  the  request. 

Mary  Elizabeth  Stevens  Is  also  working  on  a  means  of  leading 
the  searcher  to  an  answer  even  if  he  does  not  ask  the  question  properly. 

In  her  "I.Q."  or  "Selective  Recall"  system  a  computer  program  Is 
evolved  to  permit  multiple  levels  of  generic,  specific,  and  associative 

access  to  Items  In  an  Index  store. 

Claire  Schultz's  work  In  use  statistics  should  also  be  noted.  In 
a  study  of  the  indexing  terms  of  the  Merck  Sharp  &  Dohme  Research 

1 3 

Laboratories  Indexing  system  (reported  by  Schultz  and  Shepherd),  the 
frequency  of  use  of  the  Indexing  terms,  both  singly  and  In  combination 
with  one  another,  was  determined.  In  another  study,  Mrs.  Schultz  and 
her  associates  conducted  a  comparative  study  of  the  dictionaries  of  the 

14 

Merck  Sharp  and  Dohme  punched  card  system  and  the  AST IA  computer  system. 

Computer  routines  were  used  In  both  studies.  The  summary  of  the 
results  of  the  latter  study  is  quoted  below  to  Illustrate  how  such 
findings  might  be  used  In  system  or  index  evaluation. 
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"(1)  In  both  systems  the  shape  of  the  curve  for 
the  occurrence  of  single  descriptors  In  the 
sample  is  one-half  of  a  U.  The  same  Is  true  for 
the  occurrence  of  pairs  of  descriptors. 

(2)  One  system  generates  more  double,  triple 
and  quadruple  combinations  of  descriptors  than 
the  other.  This  is  because  that  system  has  a 
higher  number  of  descriptors  per  document.  Des¬ 
criptors  are  ordinarily  searched  for  In  triple 
and  quadruple  combinations.  Whether  or  not  the 
system  providing  the  higher  number  of  descriptor 
combinations  has  the  more  useful  file  remains  to 
be  scored. 

(3)  A  curve  has  been  drawn  for  each  system, 
measuring  the  use  of  single  descriptors  In  terms 
of  the  average  number  of  uses  of  descriptors  In 
that  system.  The  two  curves  nearly  coincide.  It 
Is  hypothesized  that  the  shape  of  this  curve 
(nearly  a  straight  line  for  cumulated  frequency 
of  use  of  descriptors)  is  an  intrinsic  character¬ 
istic  of  dictionary  use,  to  which  dictionary  use 
in  all  systems  can  be  related.  More  systems  will 
have  to  be  analyzed  before  an  acceptable  standard 
can  be  chosen  for  interpreting  the  meaning  of 
deviations  in  the  curves  of  individual  systems,” 

7.4.4  User  Studies 


User  studies  differ  from  the  use  statistics  studies 
described  above  in  that  the  former  refer  to  actually  studying  the  user 
rather  than  the  system. 

Most  of  the  user  studies  are  determinations  of  the  requirements 
of  the  users.  Mortimer  Taube  prepared  "An  Evaluation  of  'Use 
Studies'  of  Scientific  Information"15  in  1958  in  which  he  analyzes  such 
work.  At  that  time,  he  says,  the  consensus  was  that  the  studies 
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we.ro  not  of  much  value  In  the  des i tin  of  a  system.  Some  objected  to  the 
studios  on  technical  grounds  and  questioned  the  methods  of  sampling, 
Interviewing,  etc.  Or.  Toube  analyzes  the  reasons  for  the  "generally 
accepted  failure"  of  use  studies  by  establishing  a  distinction  between 
consumer  services  and  professional  services.  He  concluded  that  the 
organization  and  dissemination  of  scientific  Information  Is  a  profession¬ 
al  activity,  and  that  such  responses  cannot  supply  directions  for  the 
design  of  more  effective  scientific  Information  and  reference  systems. 

Consumer  acceptance  of  Information  systems  is  without  a  doubt 
Important.  This  Is.  towever,  primarily  a  packaging  problem  and  a  user 
education  problem.  A  consumer  will  tend  to  buy  an  attractive  box  and 
will  not  want  something  he  finds  difficult  to  use. 

User  studies  based  on  Interviews  with  scientists  In  order  to 

determine  where  they  seek  Information  fit  In  the  consumer  acceptance 

area,  but  use  studies  based  on  analysis  of  the  kind  of  reference 

questions  a  library  receives  more  nearly  apply  to  the  design  and 

evaluation  problem.  Saul  and  Mary  Herner  (Herner  and  Company)  have 

conducted  both  kinds  of  studies.  Especially  the  latter,  they  believe, 

can  be  used  In  the  design  or  "tailoring"  of  classification  systems. 

ducted  below  are  two  paragraphs  from  a  report  by  the  Herners;  in  this 

statement  is  outlined  the  kind  of  useful  knowledge  they  believe  can  be 

16 

gained  from  user  studies. 
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"Unfortunately,  for  the  time  being,  we  have  to  content  ourselves 
with  by-products.  The  first  was  an  analysis  of  the  subject  content  and 
logical  structure  of  the  reference  questions  we  had  collected  In  our 
course  of  the  study, (2)*.  The  two  most  striking  findings  of  this  by¬ 
product  study  were  that  over  a  fifth  of  the  questions,  all  of  which  had 
been  delegated  to  nuclear  energy  libraries,  were  completely  non-tech- 
nlca!  and  the  bulk  of  the  technical  questions  Involved  two  concepts, 
these  concepts  being  almost  always  related  as  logical  products.  Of 
course,  It  can  be  argued  that  the  questions  we  collected  were  addressed 
to  manual  retrieval  systems  that  did  not  lend  themselves  to  multiple- 
subject  correlations.  Whether  requestors  would  address  more  complex 
questions  to  correlative  systems  Is  a  question  worthy  of  study  on  a 
rigorous  basis  and  susceptible  to  such  study. 

"Our  second  by-product  --  the  finding  of  small  coincidence  between 
nuclear  energy  reference  questions  and  nuclear  energy  reports  —  Is  also 
worthy  of  further  consideration.  If,  as  we  suspect,  reports  are  almost 
exclusively  tools  of  current  awareness  having  little  retrospective 
value,  the  tremendous  and  costly  efforts  that  are  being  made  to 
organize  reports  for  searches  would  seem  to  be  futile  In  the  extreme. 

On  the  other  hand,  efforts  to  use  subject  analysis  of  reports  as  a 
means  of  pin-pointing  current  disseminations  to  Interested  Individuals 
or  groups  would  seem  to  be  extremely  promising.  Both  would  make 
useful  subjects  of  study." 


7.5  Conclusions 

If  a  distinction  Is  made  between  the  Internal  and  external 
factors  In  Information  systems,  It  Is  seen  that  both  indexing  and  user 
studies  fall  In  the  "external".  Index  studies,  however,  are  "Internal." 


*  Reference  here  is  to  an  earlier  study,  "An  Experiment  In  the  Use  of 
Reference  Questions  In  the  Design  of  a  Classification  System."  ' 
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That  Indexing  studies  are  external  criteria  of  system  design  Is 
not  generally  recognized.  As  mentioned  In  Paragraph  7.2  above,  the 
Indexer's  experience  and  knowledge,  both  of  the  system  and  of  the 
subject,  play  an  Important  role  in  Indexing.  Just  as  consumer  accep¬ 
tance  or  user  requirements  are  subjective,  so  Is  Indexing. 

The  Internal  and  external  factors,  of  course,  are  not  mutually 
exclusive.  For  example,  If  It  can  be  determined  that  a  cross-reference 
structure  Increases  the  consumer  acceptance  of  a  system,  the  design  of 
the  system  could  be  affected. 

The  key  to  an  understanding  of  the  Interplay  of  Internal  and 

external  criteria  Is  found  In  the  distinction  between  Individual 

18 

short-term  and  massive  long-term  consumer  response.  The  designers 
of  Information  systems  must  exhibit  a  professional  competence  which 
Implies  the  ability  to  utilize  Internal  criteria  as  a  measure  of 
consumer  acceptability;  and  they  can  only  be  required  to  question  their 
criteria  If  massive  and  long-term  consumer  dissatisfaction  proves  the 
criteria  and  the  systems  based  upon  them  to  be  Inadequate. 
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8.  mechanizing  coordinate  indexing  systems 

Because  of  the  Intrinsic  simplicity  of  a  coordinate  indexing  system, 
there  are  many  methods  and  techniques  of  mechanization.  In  order  to 
fully  appreciate  this  simplicity,  a  brief  review  of  the  principles  of 
coordinate  indexing  and  their  implementation  is  presented  below. 

8.1  Description  and  Definition 

In  the  original  Uniterm  system  of  coordinate  Indexing,  (l)  the 
index  term  was  a  class  name  (airplane,  computer,  control  panel,  lighting, 
etc.),  (2)  the  posting  in  the  system  was  item-on-term  and  not  term-on- 
Item,  and  (3)  the  user  found  relevant  material  by  matching  or  comparing 

Item  numbers. 

This  is  still  a  correct  description  of  a  coordinate  Indexing  system, 
but  certain  variations  and  modifications  have  appeared.  Now,  for 
example,  there  are  also  coordinate  indexing  systems  which  compare  term 
numbers  in  term-on- item  systems.  Furthermore,  although  all  coordinate 
Indexing  systems  use  class  names ,* certain  hierarchical  classification 
schemes  have  been  brought  Into  play.  This  Is  discussed  in  Sections  4 
and  5  of  this  report. 


*  "Class  names"  covers  all  the  following;  Uniterms,  descriptors, 
keywords,  unit  terms,  selectors,  locators,  etc. 
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A  coordinate  Index  Is  distinguished  by  the  fact  that  the  user 
himself  (perhaps  with  mechanical  assistance)  performs  the  coordinations. 

Ha  does  this  with  terms  and  items  coded  and  arranged  In  such  a  way  that 
they  can  be  readily  susceptible  to  logical  operations. 

8.2  ipgRpi  fleiallm. 

A  typical  logical  operation  of  coordination  Is  finding  the 
logical  Intersection:  this  class  jagd  that.  However,  all  the  Boolean 
functions  may  be  represented  by  machine  operations  In  the  system,  as 
required  by  the  search  techniques. 

The  possible  logical  operations  are  listed  below: 

1.  This  class  and  that  (intersection) 

2.  "  "  or  "  (union) 

3.  11  "  but  not  that  (negation) 

4.  Combination  of  this  class  and  that  plus  either 
or  both  of  two  others  (combination  of  logical 
Intersections  and/or  logical  unions) 

5.  Other  complex  combinations  of  (l),  (2),  (3). 

8.3  Coding  and  Arrangement 

Coding  and  arrangement  vary  from  system  to  system.  Certain 
generalizations  may  be  made,  however,  about  all  coordinate  Indexing  systems. 

Either  the  term  or  the  item  or  both  may  be  designated  by  a  number 
code.  In  an  Inverted  (item-on-term)  system,  where  terms  are  compared 


m 


for  item  matches,  It  Is  obviously  expedient  to  have  the  Item  In  the 
briefest  possible  form.  If  a  slx-dlglt  number  Is  used,  999,999  Items  can 
be  designated  by  means  of  only  the  numbers  0  through  9. 

The  same  holds  true  In  term-on-ltem  systems  where  It  Is  expedient 
to  designate  terms  by,  say,  a  slx-dlglt  number. 

Some  systems  use  numbers  for  both  the  term  and  the  Item.  Others 
leave  either  the  Item  or  term,  whichever  Is  not  "read"  for  matching 
purposes,  In  alphanumeric  form. 

In  manual  systems,  the  Item  and/or  term  numbers  are  usually  Arabic 
numerals.  However,  as  In  Batten  or  peek-a-boo  and  edge-notched  cards, 
these  numbers  may  then  be  designated  by  hole  positions. 

In  mechanized  systems,  the  Arabic  numbers  become  binary  numbers 
and  tire  coded  by  various  means.  Binary  numbers  are  used  because  they 
require  only  two  codes,  one  for  zeros  and  another  for  ones. 

(1)  In  punched  card  or  tape  systems,  the  binary  numbers 

are  coded  by  "hole"  and  "no  hole". 

(2)  In  microfilm  systems,  the  binary  numbers  are  coded 

by  opaque  and  transparent  markings. 

(3)  In  magnetic  tape  or  magnetic  card  systems,  the  binary 

numbers  are  coded  by  magnetized  and  nonmagnet  I  zed  areas. 

In  a  mechanized  system,  the  search  question  must  also  be  coded. 

Hot  only  must  the  terms  be  specified,  but  also  the  logical  operations. 
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The  logical  operation  may  also  be  coded  by  binary  numbers,  e.g,, 

01  ■  union,  11  ■  Intersection,  10  •  negation.  There  are,  of  course, 
many  other  Instructions  and  codes  that  are  used,  e.g.,  In  programming  a 
computer,  but  we  will  not  concern  ourselves  with  these. 

The  method  of  carrying  out  mechanized  searches  varies  with  the 
arrangement  of  the  store;  whether  It  Is  a  term-on* Item  or  an  Item-on- 
term  system  and  whether  codes  are  randomly  or  sequentially  ordered. 

This  will  be  discussed  In  further  detail  In  Paragraph  8.5. 

Because,  as  we  have  seen,  coordinate  Indexing  systems  rely  only  on 
logical  operations  with  numerical  codes  for  terms  and  Items,  they  are 
Ideally  suited  to  mechanization.  When  roles,  links,  or  semantic  codes 
are  Introduced  Into  a  system,  certain  modifications  must  be  made,  but 
the  procedures  remain  generally  the  same. 

There  are  many  means  presently  available  for  mechanizing  or 
partially  mechanizing  an  Information  storage  and  retrieval  system  based 
on  coordinate  Indexing.  Devices  currently  used  range  In  sophistication 
from  high-speed,  large-capacity,  random-access,  general-purpose  compu¬ 
ters  to  comparatively  simple  devices  such  as  Jonker  Business  Machines, 
Inc.'s  Mlnlmatrix. 

There  are  those  who  feel  that  the  most  significant  lack  In 
mechanized  systems  Is  that  of  an  automated  Input  —  even  to  the  extent 
of  the  selection  of  Indexing  terms. 
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8.4  Myj 

Index  terms  must  be  entered  manually  Into  the  system  for 

conversion  to  machine-man I pul able  codes  or  for  position-coding, such  as 

on  peek-a-boo  cards.  Even  In  microfilm  systems  where  text  can  be 

entered  directly  Into  the  system,  manual  Indexing  and  coding  Is  still 

required.  It  Is  sometimes  Implied  that  In  a  microfilm  system  complete 

searching  of  text  Is  possible.  What  Is  actually  the  case,  however,  Is 

that  once  an  apparently  relevant  Item  has  been  retrieved  via  the  coded 

Index,  the  user  can  scan  the  immediately  available  text.  Examples  are 

devices  such  as  Mlnlcard,  Fllesearch,  and  the  Rapid  Selector, which  will 

be  discussed  In  greater  detail  In  Section  2d.  Another  more  recent 

development,  reported  by  National  Cash  Register,  Is  Its  high-density 

2 

document  storage  system  utilizing  a  photochromlc  microimage  memory.  In 
none  of  these,  however,  Is  manual  Input  or  manual  coding  eliminated;  and 
at  the  present  time  all  searching  devices  make  use  of  some  kind  of  code: 
hole  position,  numbers,  optical  patterns,  binary  codes,  etc. 

8.4.1  Coding  Devices 

Codes  are  prepared  by  various  means.  The  most  common 
devices  are  IBM  or  Remington-Rand  punched  card  or  tape  machines, 
Justowrlters  or  Flexowrlters  (tape-producing  typewriters),  the  Jonker 
400  Termatrex  card  hole-puncher,  and  similar  devices.  These  are 
neither  high-speed  nor  "automatic",  l.e.,  all  require  operators  and  are 
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limited  by  the  operators'  speed. 

IBM  Is  Investigating  the  possible  use  of  "Stenotype"  In  I.R.  Stereo¬ 
type  Is  used  by  stenographers;  It  produces  a  paper  tape  In  which  words 
are  abbreviated  or  "coded"  In  shorthand  form. 

Optical  and  magnetic  coding  devices  will  be  discussed  In  greater 
detail  In  Paragraph  8.5;  these,  too,  rely  on  manual  Input  of  data. 

8.4.2  Character  Recognition 

It  has  been  proposed  that  character  recognition 
devices  could  be  used  as  Input  devices  —  to  eliminate  the  manual 
operations  of  coding  Input  data  such  as  Index  terms  or  full  text.  Most 
existing  readers,  however,  read  only  text  which  has  been  specially  pre¬ 
pared  manually.  That  Is,  the  codes  or  "characters"  are  manually  Imposed 

on  the  Item;  examples  of  this  are  the  American  Banking  Association's 

3  4 

magnetic- ink  character  readers,  or  the  British  "Luton  Experiment" 

5  6  7 

phosphorescent-coded-mall  sorting  devices,  *  * '  The  character  readers, 
then,  are  simply  analogous  to  punched-card  reading  equipment. 

Optical  scanners  on  the  market  manufactured  by  Farrington  Manufac¬ 
turing  Company,  like  the  devices  mentioned  above,  can  only  read  specially 
prepared  text,  or,  at  best,  only  selected  type  fonts. 

Existing  character  readers  do  not  accomplish  either  of  the  two 
things  which  could  make  them  useful  in  I.R.:  (l)  code  automatically 

(l.e.,  prepare  machine  Input  directly  from  Index  terms  or  full  text), 
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or  (2)  eliminate  coding  by  permitting  direct  search  of  printed  or 
written  Indexes  or  texts. 

There  are,  however,  several  competing  firms  engaged  In  research  on  or 
development  of  character  readers.  In  August  I960,  the  Wall  Street 
Journal  reported  activities  by  IBM,  NCR,  RCA,  Remington-Rand, 
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Farrington, and  Balrd-Atomlc.  Work  is  being  performed  by  Bell  Telephone, 
Sandia  Corporation,  and  the  University  of  Michigan.9  Rablnow  Engineering, 
Phil co  Research  Center,  and  General  Electric  Computer  Laboratory  are  also 
engaged  In  character  recognition  studies.  This  Is  only  a  partial  list 
of  U.  $.  companies  Involved.  Some  foreign  companies  engaged  in  this 
work  are  Compagnie  des  Machines  BULL  (France),  Electronic  and  Musical 

Industries  (England),  and  Telefunken  (Germany).10 

One  of  the  more  advanced  readers,  a  working  developmental  model  of 
which  has  been  completed,  is  Baird-Atomic  Inc.'s  optical  print  reader. 
Although  this  was  designed  to  meet  requirements  of  the  Air  Force  machine 
translation  program,  with  very  little  modification  It  could  be  used  In 
indexing, searching,  and  storing.  This  equipment  generates  a  magnetic 
tape  directly  from  text  on.70-mm.  film.  This  Is,  In  effect,  producing 
the  machine-man  I  pul  able  codes  mentioned  above.  Because  the  tape 
produced  carries  all  printing  information  (fonts,  spacing,  etc.),  It 
could  also  be  used  to  operate  output  printers  for  graphic-arts  quality 
printout.  As  will  be  discussed  In  Paragraph  8.6,  such  printers  are  still 
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in  experimental  stages.  In  fact,  In  order  to  check  the  magnetic  tapes 
produced  by  Its  character, reader,  Balrd-Atomlc  finds  It  necessary  to 
punch  a  paper  tape  and  then  read  this  on  a  paper-tape  reader.  The 
principle  of  operation  of  the  reader  was  described  In  a  report  to  the 

11 

U,  $.  House  of  Representatives  Committee  on  "Science  and  Astronautics." 
Other  printed  reports  are  not  available;  therefore,  some  additional 
Information  obtained  during  a  visit  to  the  company  is  detailed  here. 

Certain  limitations  of  the  system  were  brought  to  light,  not  the 
least  of  which  Is  the  cost.  Final  models  will  probably  range  from 
$200,000  to  $400,000.  A  limited  market  is  anticipated  because  of  the 
speeds  available;  one  man  at  Baird-Atomic  predicted  that  two  machines 
would  handle  "all  the  Russian  material".  Although  speeds  vary  accord¬ 
ing  to  the  number  of  fonts  being  read,  a  reported  minimum  speed  is  60 
char/sec. 

At  present,  the  machine  works  only  from  70-mm.  film  but  could  be 
modified  to  handle  some  other  film,  for  example,  16-mm.  microfilm,  but 
cannot  read  opaque  material. 

The  machine  could  also  be  modified,  and  perhaps  with  some  cost 
saving,  to  read  only  one  or  two  type  fonts.  (This  might  be  useful  for 
handling  typewritten  abstracts  or  documents.)  Twelve  fonts  ore 
accommodated  in  the  developmental  model.  This  means  that  at  any  one 
time  12  fonts  can  be  read.  The  12  fonts  can  be  chosen  from  any  number 
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of  fonts,  and  preparation  of  font  discs  Is  relatively  simple;  one  method 
Involves  sandwiching  a  piece  of  film  between  two  layers  of  glass.  Type 
sizes,  however,  are  critical,  because  the  appearance  of  the  letters 
varies  from  size  to  size  and  a  font  disc  Is  required  for  each  size.  There 
is  no  way  to  compensate  for  this  size  variation  by  using,  for  example, 
different  reduction  ratios. 

The  Balrd-Atomlc  character  reader  does  not  "read"  graphs,  drawings, 

or  vertical  or  slanted  printing. 

Despite  the  limitations  enumerated  above,  Bal rd-Atomlc's  print 
reader  represents  significant  progress  In  the  field  of  character 
recognition. 

What  may  be  called  somewhat  more  sophisticated  techniques  of 
character  recognition  are  being  investigated  by  Rablnow  Engineering. 

Their  work  has  been  primarily  In  pattern  matching  techniques  (as  In  the 
Rablnow  Universal  Reader).  Their  method  differs  from  other  matching 
techniques  In  that  upon  converting  the  given  character  into  an  electronic 
replica  and  comparing  the  replica  to  a  standard  set  of  stored  electronic 
Images,  a  quantitative  measure  of  the  best  match  is  obtained.  Rablnow 
Is  also  exploring  curve  tracing,  the  character  elements  either  being 
traced  by  a  spot  of  light  (generated  by  a  cathode  ray  tube,  for  example), 
or  a  radar  tracking  technique  which  is  used  to  follow  the  lines  without 
the  aid  of  a  moving  light  source.  The  directions  In  which  the  character 
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elements  90  are  then  recorded,  and  the  character  is  recognized  by  the 

12 

sequences  and  lengths  of  the  curve-traced  elements.  These  techniques 
are  potentially  applicable  to  the  reading  of  handwriting;  machines  which 
rely  on  exact  match  do  not  have  this  potentiality. 

In  this  report,  we  have  only  Intended  to  show  the  level  of  achieve¬ 
ment  and  to  Indicate  what  use  coordinate  Indexing  systems  may  make  of 
character  readers.  For  a  full  summary  of  the  state-of-the-art  of 

character  recognition,  the  reader  Is  recommended  Mary  Stevens'  report 

13 

published  by  the  National  Bureau  of  Standards. 

In  conclusion  it  may  be  said  that  although  considerable  time  and 
effort  must  still  be  expended  In  perfecting  a  variable-font,  full-page, 
high-speed  print  reader ^  sufficient  advance  has  been  made  to  permit 
reasonable  anticipation  of  such  a  device. 

8.4.3  Automatic  Indexing 

Luhn's  keyword- in-context  ( KW I C)  Indexes  are  discussed 
elsewhere  In  this  report;  these  have  proven  practical  in  limited 
application.  The  technique  is  more  nearly  a  means  of  mechanically 
producing  an  Index  than  automatic  Indexing.  Luhn  is,  however,  also 
working  on  mechanical  methods  of  Indexing  and  abstracting  by  "counting' 
or  determining  the  frequency  of  occurrence  In  documents  of  "notion" 

A  similar  method  is  used  in  permuted  title  word  Indexing.16 


words. 
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That  such  a  method  will  be  suitable  for  many  types  of  users  Is 
probable.  This  method  cannot  be  adequately  evaluated,  however,  In  terms 
of  practicality  and  economy,  until  automatic  character  readers  become  a 
reality. 

8.5  Store  and  Search 

Once  the  material  of  the  system  has  been  Indexed,  It  must  be 

Incorporated  In  the  total  store  and  must  be  retrievable.  Methods  of 

entering  the  data  Into  the  store  have  been  briefly  discussed  above  and 

will  be  more  fully  described  In  this  section. 

In  some  coordinate  indexing  applications,  both  the  Index  and~the 

material  Indexed  (text)  are  part  of  the  "store";  In  others,  only  the 

Index  Is  the  "store".  In  the  latter  case,  the  physical  store  of  material 

indexed  Is  never  actually  searched.  Actually,  this  Is  also  true  In  the 

first  case  since,  whether  the  text  Is  stored  In  the  system  on  microfilm, 

magnetic  tape,  or  otherwise,  It  Is  not  searched,  but  Is  merely  available 

for  print-out  or  on-the-spot  scanning.  Thus,  for  our  purposes,  only  the 

Index  store  Is  of  Interest, 

8,5.1  EAM  and  Manual  Systems 

Many  physical  forms  of  coordinate  Indexes  are  well 

known  and  will  not  be  detailed  here,  Some  of  these  are  Uniterm  cards  or 

17' 

printed  double-dictionary  Uniterm  indexes,  the  rather  recent 

|D  IQ  a/'t 

Tabledex  Indexes,  '  “  and  the  familiar  peek-a-boo  and  edge-notched 
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cards.  (The  references  here  are  to  recent  or  definitive  literature} 
there  Is,  however,  considerable  other  material  available  which  Is 
referenced  throughout  the  bibliography,  Section  12  of  this  report.) 

The  NBS  Ml  croc  I te  Machine  Is  based  on  an  interesting  extension  of 
the  peek-a-boo  principle.  In  the  usual  peek-a-boo  system,  the  position 
of  a  hole  Is  Interpreted  as  a  document  serial  number.  In  the  Mlcroclte 

concept,  the  searcher  views  a  description  of  the  document  at  each  hole 

✓  - 
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position.  The  description  Is  on  microfilm  which  Is  projected  and 
enlarged  on  a  screen. 

By  means  of  other  coding  techniques,  coordinate  indexes  may  also 
be  stored  on  punched  cards  (aperture  cards  fall  In  this  class),  punched 
tape,  COMAC  cards,  magnetic  tape,  discs,  or  other  magnetic  media,  or  on 
microfilm.  From  all  of  these,  "hard  copy"  Indexes  can  be  generated. 
(COMAC,  also  known  as  the  IBM  9900  Special  Index  Analyzer,  Is  described 
In  "Studies  In  Coordinate  Indexing,"  Volume  V.)^ 

The  search  means  Is,  of  course,  dependent  on  the  storage  means. 
Where  the  Index  Is  stored  on  peek-a-boo  cards,  for  example,  search 
consists  of  manually  comparing  term  cards.  Where  IBM  punched  cards  are 
used,  decks  can  be  compared  by  machine  or  manually.  Most  card  systems 
use  EAM  (electronic  accounting  machines)  for  storage  and  retrieval,  but 
the  COMAC  and  two  other  IBM  devices,  the  IBM-9310  Universal  Card  Scanner 
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and  the  IBH-101  with  Row-by-Row  Scanning  Attachment  (a  modification  of 
the  IBH-101  Statistical  Sorter),  could  also  bo  used. 

8.5.2  High-Speed  Magnetic  Tape  Computers 

Where  the  Index  Is  stored  on  magnetic  tape,  a  computer 
Is  used  for  the  Index  coordinations,  or  a  special  magnetic  tape  searcher 
can  be  used.  Where  the  Index  Is  on  microfilm,  various  photoelectric 
code-sensing  devices  are  used.  Since  magnetic  tape  and  other  magnetic 
media  and  microfilm  are  of  special  Interest  because  of  potential  high 
density  of  storage  and  speed  of  search,  these  will  be  discussed  In 
detail  below. 

There  has  been  considerable  debate  over  the  use  of  high-speed 

magnetic  tape  computers  for  Information  storage  and  retrieval.  The 

strongest  adverse  argument  lies  In  the  cost  of  this  equipment.  When  one 

Is  faced  with  a  choice  between  a  high-speed  system  which  rents  for  from 

$1,000  to  $300,000  per  month  and  a  punched  card  system  whose  total  cost 

could  be  less  than  a  year's  computer  rental,  he  will  obviously  be 

Inclined  toward  the  latter.  However,  one  must  keep  in  mind  the  fact 

that  computers  work  so  quickly  that  they  are  required  only  for  short 

periods,  even  for  large  collections  of  material.  Thus,  If  the  computer 

can  be  used  on  a  time-sharing  basis,  It  can  be  completely  practical.  It 

has  been  proposed  that  before  ADP  equipment  can  be  used  In  libraries, 

24 

Its  cost  must  be  reduced  by  a  factor  of  10.  This  statement,  however, 
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does  not  take  Into  consideration  conditions  such  as  time-sharing. 

Many  computers  do  not  have  random  access;  l,e.,  the  entire  file 
must  be  searched  serially.  This  can  be  compensated  for  by  "batching" 
queries,  but  many  users  are  reluctant  to  hold  questions  for  even  as 
little  as  a  day,  claiming,  and  possibly  with  Justice,  that  a  high-speed 
computer  should  give  not  only  rapid  but  Immediate  replies. 

Some  users  have  put  their  entire  reference  file  on  magnetic  tape, 
but  use  this  only  for  periodic  printout  of  Indexes  and  use  the  printout 
for  searching  manually.  This  Is  economical  Insofar  as  needless 
repetition  In  sorting,  filing,  etc.,  Is  eliminated,  and  the  computer 
Itself  updates  the  system.  Obviously,  this  type  of  operation  would  be 
far  more  attractive  If  the  computer  were  also  used  for  processing  queries. 

Both  success  and  disappointment  have  been  reported  In  the  use  of 
high-speed  computers  for  I.R.^In  some  cases,  after  review  of  operations 
It  was  found  that  by  making  certain  changes — batching  queries,  time¬ 
sharing,  memory  modlf icatlon--the  disappointment  could  be  minimized  or 
el Imlnated. 

Generally,  the  "permanent"  Index  Is  stored  on  magnetic  tape.  In 
processing  or  searching,  however,  a  temporary  store  or  memory  (magnetic 
core  or  drum)  is  used  to  hold  the  queries  and/or  program  as  well  as 
results  for  the  search  process.  An  Important  factor  In  I.R,  systems  Is 
the  number  of  searches  which  can  be  run  simultaneously.  The  IBM  7090, 
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which  Is  used  at  the  G.E.  Flight  Propulsion  Division,  Technical  Infor¬ 
mation  Center,  can  handle  as  many  as  1,300  questions  simultaneously. 

This  requires,  of  course,  powerful  logic  and  large  Internal  memory, 
but  means  that  the  processing  Is  done  In  a  single  run  through  the 
magnetic  tape  flies.  This  is  Important  because,  too  often,  powerful 
computers  are  slowed  down  by  tape  speeds. 

Exhibit  A  shows  several  characteristics  of  computers  In  the 
$5,000  to  $50,000  monthly  rental  range.  This  chart  was  specially  pre¬ 
pared  with  I.R,  applications  in  mind, Other  computer-characteristics 
charts  are  available;  one  of  the  most  complete  is  that  prepared  by 
Adams  Associates.^ 

IR  systems  differ  in  configuration  according  to  different  require¬ 
ments,  An  ADP  (automatic  data  processing)  system  may  have  to  be 
modified  or  new  ones  may  have  to  be  designed  to  fit  the  I.R.  requirements. 
For  example,  the  requirements  for  either  multiple  access  or  single 
access  will  have  a  strong  influence  on  choice  of  system.  In  some  cases, 
large  Internal  storage  Is  required;  in  others  a  large  temporary  memory 
Is  demanded  for  processing  either  Internal  or  external  stores.  These 
problems  and  others,  In  any  combination,  may  be  present. 

It  Is,  therefore,  essential  that  a  complete  systems  study  be  made 
before  a  computer  Is  purchased.  In  the  past,  disappointment  has  been 
registered.  In  cases  where  this  was  done  or  where  a  company  already  had 
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an  ADP  system  and  tried  to  force  It  to  fit  the  I  .R.  requirements.  A 
systems  study  requires  examination  of  hardware  which  can  provide  suff  icient 
capability  for  all  functions.  There  is  more  to  this  than  "shopping" 
for  a  computer.  High-capacity  disc  files  and  large  memory  drums,  for 
instance,  are  available  but  may  not  be  inherent  In  an  otherwise  appropriate 
AOP  system.  Exhibit  B  shows,  by  way  of  example,  characteristics  of  Just 
two  large-capacity  random-access  disc  memories.  One  or  the  other  of 
these  may  best  suit  a  particular  need.  Also  by  way  of  example, 
characteristics  of  two  kinds  of  magnetic  storage  drums  are  shown  In 
Exhibit  C.  In  many  systems,  hlgh-density  tape  systems  will  be  required. 

If  one  compares  the  various  characteristics  of  magnetic  tape  systems  on 
Exhibit  A,  he  can  see  the  advantages  of  mix-and-match  system  techniques. 

The  magnetic  tape  system  of  the  Bendix  G-20,  for  example,  may  be  just 
as  useful  with  another  computer.  This  is  one  of  the  higher-density  tape 
systems  avai lable, and  its  characteristics  are  outlined  in  detail  in 
Exhibit  0. 

Many  people  have  been  misled  into  believing  that  a  computer  will 
not  solve  I.R.  problems  simply  because  they  see  no  computer  that  lias 
solved  them.  The  need,  therefore,  for  systems  study  and  systems  design 
cannot  be  overemphasized.  Because  these  criteria  have  not  been  met,  a 
best  solution  for  a  given  problem  has  not  been  realized. 
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EXHIBIT  B 
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COMPARISON  or  TWO  LAROE-CAPACITY 
RANDOM-ACCESS  DISC  MEMORIES 


TELEX,  INC. 

MODEL  XXA 

BRYANT  COMPUTER  PRODUCTS 

Disc  Diameter 

31" 

39" 

Rotational  Speed 

1200  rpm 

900  rpm 

Recording  Surfaces 

126 

39 

Number  of  Heads 

256 

234 

Head  Positioners 

64 

1  per  disc  side 

Tracks  per  surface 

256 

768 

Bit  density  (max.) 

400  bits/inch 

273  bits/inch 

Track  density 

25.6/inch 

64/inch 

Storage  Capacity 

Per  file 

617,644,032  bits 

603,857,592 

Per  surface 

4,825,344  bitB 

15,483,528 

Per  Track 

Zone  1 

12,566  bits 

11,575  bits 

"  2 

25,132  " 

15,015  " 

"  3 

18,427  " 

•'  4 

21,840  " 

"  5 

25,279  " 

”  6 

28,665  " 

Transfer  rate 

Zone  1 

251,320  bits/sec 

174  ko 

"  2 

502,640 

225  " 

"  3 

276  " 

*  4 

328  " 

H  5 

380  " 

"  6 

431  " 

Access  Time 

42  ms (average) 

167  ms (max) 

Price 

$185,000 

$140,041 

2 


Bit  density 


Tape  Speed 

Number  of  Channels 

Interchannel  Time 
Displacement 

Interblock  Cap 

Error  Detection 

Error  Correction 

Reliability 

Transient  Error  Rate 

Permanent  Error  Rate 

Reread  time  to 
recover  transient 
errors 


up  to  2,000/inch 
up  to  150  inch/aec 
up  to  20  per  inch  of  tape  width 
less  than  0.2  ms  at  buffer  output 

as  short  as  0.3” >  0.75"  typical  for  dual 
read/write  operation  at  100  In/sec. 

Parity  channel  provides  single  error 
detection 

Single  parity  channel  makes  possible  single 
error  correction 

1  in  107  to  108  max.  at  1500  ppi. 

1  in  10®  to  109  max.  at  1500  ppi. 

less  than  .005%  of  on-line  time  at  1500 
ppi. 
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it  has  bain  suggested  that  certain  def Iclences  must  be  tolerated 

In  I.R.  systems  In  order  to  expend  only  reasonable  funds.  This  was 

pointed  out  In  arguing  for  the  general-purpose  computer  as  opposed  to 

special  devices  like  the  COHAC  which,  It  was  said,  have  limited 

27 

application.  with  this  we  cannot  totally  disagree,  but  we  must  Insist 
that  even  Ih  using  a  general-purpose  computer,  one  need  not  take  second 
best  Just  because  It  exists,  since  first  best  might  be  effected  by 
system  modification  and/or  time  sharing. 

This  has  been  but  a  very  brief  summary  of  the  possible  applications 
of  magnetic  tape  computers  to  coordinate  Indexing  systems.  In  the 

appended  references  are  cited  several  Items  which  contain  additional  or 
more  detailed  Information.  28,29,30. 

In  Section  9  of  this  report  the  operating  experience  of  several 

facilities  which  use  magnetic  tape  computers  Is  detailed. 

8*5‘3  Other  Magnet lc-Hed la  Systems 
Magnetic  Tape  Searchers 

Several  efforts  are  underway  to  develop  file-searching 
devices  based  on  magnetic  tape  systems.  These  are  attempts  to  make 
special-purpose  devices  which  will  do  the  same  processing  for  I.R,  purposes 
that  a  computer  does,  but  at  a  lower  cost.  It  has  been  reported  31  that 
most  of  these  developments  are  too  high  priced  to  compete  with  several 
moderately  priced  computers;  an  exception  noted  Is  Hamer's  Tape 
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Searcher  (approximately  $10,000). 


Some  of  the  tape  systems  under  development  are: 

Logic  Processor  (Aeronautics) 

Index  Searcher  (Computer  Control  Co,,  Inc.) 

Unlvac  Tape  Searcher  (Remington-Rand)  ' 
inn  Findafact  (Rese  Engineering  Co.) 

T*P»  Searcher  (Herner  t  Co.)  U. , 

ju  ji'-i  ]  jjin  j ud  .oeigealb  yMeioJ  Jonnih:,)  oii  cIi.j 

The  GE-250  Information  Selector  was  designed  and  developed  for 

bnoone  Son  birsn  arm  ,  1  i. ;)  1  * tjiwi u  s a 1  ‘ cj "i u q -  I i ri  1 1  -< i"  ■  1 

Western  Reserve  University's  Center  for  Documentation  and  Communlcptlpn 

yd  f'd  .tdplm  Jeorf  3n11  '-■•mir  <*' *■*>■■■*  :r5  >-vt,  „vu< 

Research.  However,  In  order  to  meet  the  required  Installation  date, 

..^h aH;:  Mi  1  «c>  a-nc,  oof  jr,  >*  > . 1;i”- . ■?{?■ 

the  GE-225  was  delivered.  The  GE-225  Is  a  transistorized  gpnerf I -purpose 

enoUsniloq*  e'dljunq  wftl  lo  yismu'i  ^'id  V^v  '■  '?-n  ' ' '  • 

digital  computer  with  a  special  programming  feature  which  allows  the 

Hirij!  nl  .  pnlxebnl  '' ■1" 'r-l''  ?  u 

WRU  specifications  to  be  met.  Work  on  the  GE-250  has  been  dropped. 

k,  Unoi  jfbts  nleifioo  tbidw  e**Jl  is- .  ov.,  i>  :>v,  i s/imvi  1 
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The  Magnacard  system  (Hagnavox  Company)  stores  datp  on  I"  X  3" 

j  1  j "!  O'  v  £•  i;  '10  n.'KjKO  "i  ‘  *  1  "  ■ 's  1  '■ 

magnetic  cards.  A  single  card  has  a  capacity  of  1000  decimal  digits  qr 

,  [-«;  t  ■  £liib  /i  ‘■'•jur'i:'1'1 . ">  -iOT  .i  >  :i'.i  ^iriw  ,  M"h  J i  '  1  ■  *■  1 

600  alphanumeric  characters.  There  Is  provision  for  large-scale 

ffnoj  ;fy .’m  .  r  % ‘.'''..LI:!-  .!/V'"  1  11  '•  ■'*' 

processing  of  file  items  and  for  random  access  to  individual  Items. 

s,;-|  >■]  i  ,  W.2.  5^ 

This  equipment,  like  special  magnetic  tape  searchers,  attempts  to  per- 

, ■,  . *  I  ;  ~\  vl  J  flVflh  <•>:!'  V  i'.l.'*  1  !t  '<1t  '!  J  1  V  '  *'•'  "■ 

form  the  same  I.R.  operations  that  a  computer  can  but  to  do  so(  at  lower 

f,-i,  'i  h  ;Vl  .t..  S'”  S  '  ■■  1  >:  ■'  1  »'.r  ''  ■'  ’■ 

cost. 

j  V)l  1 1 r- •  1  1  ''  1  ■«'  'H" 1 11"'  '  ''  '  '' . . 

Magnacards,  with  microfilm  attached,  are  also  being  used  In  the 

j-  k  Li v.n !  v  ■ 1  •  •  '  1  ’ " 1 

Magnavue  system  (Magnavox  Company). 

,  ;j ■!; i, ,i. > «  1  j  i  w  'j ^  'i  ■'  1  1  11 

Manufacturer's  literature  is  available  on  all  these  systems. 


8.5.4  Microfilm  Systems 
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There  are  basically  three  types  of  microfilm  systems 
used  In  coordinate  Indexing:  those  which  use  pieces  of  microfilm 
Inserted  In  or  pasted  on  cards,  those  which  use  microfilm  In  conjunction 
with  magnetic  media,  and  those  which  use  "chips"  or  reels  of  microfilm 
with  photo-optic  codes. 

The  first  of  these  will  not  be  discussed  In  detail  here  since  they 
are  treated  as  punched  cards  (e.g.,  aperture  cards  used  In  systems  like 
Fllmsort)  or  are  purely  manual  systems.  The  Filmsort  Company  (Division 
of  Minnesota  Mining  &  Mfg.  Co.)  aperture  cards  are  punched  cards  on 
which  microfilm  Is  mounted.  These  are  sorted,  collated,  etc.,  on 
standard  EAM  equipment.  (See  Paragraph  8,5.1.) 

The  second  of  these  Is  somewhat  similar  to  the  first  In  that  the 
microfilm  Is  attached  to  a  card,  but  this  card  is  of  magnetic  material 
(such  as  Magnacard)  and  Is  processed  much  like  magnetic  tape.  (See 
Section  8,5.)  An  example  of  this  configuration  Is  In  Magnavue 
(Magnavox  Company) . 

There  are  two  types  of  the  third  system.  In  one  case,  the  micro¬ 
film  system  is  used  only  for  text  storage.  This  Is  of  interest  here 
only  In  that  some  of  these  systems  do  Have  "read"  devices.  For  example, 
In  MEDIA  (Magnavox  Company.)  and  In  FLIP  (Benson-Lehner  Corp.),  Item 
numbers  are  coded,  and  the  document  can  be  mechanically  retrieved  by 
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this  number.  These  systems  at  present  do  not  provide  mechanical 
retrlevablltty  via  an  index. 

(A  great  deal  of  work  Is  being  done  on  microfilm  systems  for  text 

storage,  both  In  Increasing  density  of  storage  and  In  making  rapid 

access  to  documents  possible.  Much  of  this  work  Is  being  sponsored  by 
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Council  on  library  Resources.  The  AVCO  Corporation  Is  developing, 
with  Council  support,  a  system  based  on  microfilm  sheets  of  documents, 
with  as  many  as  10,000  on  a  sheet,  with  100:1  reduction  of  page  size.) 

In  the  second  case,  the  microfilm  is  used  both  as  a  storage 
medium  and  as  a  coordinate  Index.  Here,  the  documents  are  retrieved 
via  the  Index.  In  the  Rapid  Selector,  for  example,  Index  terms  are 
encoded  on  punched  EAM  cards  which  are  photographed  onto  reels  of  film, 
with  the  Indexed  document  immediately  following.  The  documents  are 
retrieved  by  means  of  a  punched  Interrogation  card  and  a  patch  panel 
which  specify  (a)  the  search  criteria  and  (b)  the  logical  relationships 
required.  The  Information  store  Is  then  moved  past  the  photoelectric 
cells  of  comparator  circuits.  When  search  requirements  have  been 
satisfied,  a  copy  circuit  Is  activated,  and  microfilm  copies  of  the 
selected  documents  are  made.  The  average  time  for  a  complete  search, 
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Including  processing,  as  reported  by  the  Bureau  of- Ships,  Is  12 


mlhutes. 
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There  are  several  other  systems  based  on  this  principle  of 

photoelectric  comparison.  Some  use  different  coding  means  (though 

all  rely  on  opaque  and  transparent  markings  Indicating  binary  Infor- 

matton)  while  others  employ  microfilm  cards  rather  than  reels  of  film. 

Some  of  these  other  systems  are: 

Fllmorex  (FI  1 more  &  Co.,  France) 

Mini  card  (Eastman  Kodak  Company) 

Fllesearch  (FMA,  Inc.). 

Manufacturer's  literature  Is  available  on  these  devices,  as  well 
as  on  MEDIA  and  FLIP, 

Additional  information  on  microfilm  systems  can  be  found  In  a 
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survey  report  prepared  by  System  Development  Corp. 

8 . 6  Display  or  Printout 

In  the  publication  or  other  dissemination  of  indexes, 
references,  abstracts,  etc.,  a  major  shortcoming  has  been  In  the  pre¬ 
paration  of  graphic-arts  quality  printout  from  processing  devices  such 
as  those  described  in  Section  2. 

8.6.1  Typewriter-Printers 

Output,  devices  of  data  processing  equipment  are 
generally  typewriters,  paper-tape  punches , or  card  punches.  Some  systems, 
like  the  IBM  870  Document  Writing  System,  use  all  three.  Various 
converters  are  used  to  transfer  Information  from  one  form  to  another, 


e.g.,  tape-to-tape  converters. 


Typewriting  can  be  generated  on  cards ,  sheets  of  paper ,  labels, 
preprinted  forms,  etc.  Such  cards  have  been  widely  used  In  the  "shing¬ 
ling"  type  of  copy  preparation  with  Ustomatlc  cameras  and  for  other 
camera  copy. 

The  disadvantages  of  typewriter  output  devices  now  In  use  are  that 
only  one  or  a  limited  number  of  type  fonts  con  be  used  and  that  no 
provision  Is  made  for  proportional  spacing  and  other  graphic-arts 
printing  requirements. 

Typewriter  printers  operate  at  speeds  up  to  1000  lines  per  minute 
(tape  and  card-punches  are  slower).  This  Is,  however,  not  as  fast  as 
many  computers  can  operate.  I8M,  Remington-Rand,  and  others  are  working 
on  the  development  of  faster  typewriters  and  punches. 

Examples  of  kinds  of  output  required  from  mechanized  systems  and 
descriptions  of  means  of  producing  Indexes,  etc.,  are  given  In  Section  9 
of  this  report  in  which  systems  operation  and  operating  experience  Is 

detal led. 

A  display  system  of  some  Interest  Is  not  reported  on  the  charts. 
This  is  a  means  for  projection  of  visual  data  onto  a  screen  either  for 
viewing  or  for  recording  (producing  hard  copy).  It  Is  a  xerographic 
technique  which  may  find  commercial  application  In  the  computer  field. 
It  Is  described  In  detail  by  Mott,  Clark  &  Dessauer  In  Photographic 
Science  &  Engineering.^ 5  This  Is  the  technique  of  the  PROXI  system 


(Project Ion  by  Reflection  Optics  of  Xerographic  Images)  (Haloid  Zero* 

I nc i) * 

Loewe,  Sisson  and  Horowitz  have  published  a  summary  of  display 

techniques,  Including  CRT,  photographic,  electrostatic  ol I  film,  and 

thermoplastic,  Basic  principles  and  typical  characteristics  are  given. 

There  is  an  extensive,  comprehensive  bibliography.  The  paper  also 
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discusses  user  requirements. 

It  should  be  noted  that  graphic-arts  printers  are  desirable 
primarily  for  printing  references,  citations,  abstracts,  or  text.  They 
are  usually  not  required  for  printing  out  search  results  (item  addresses) 
or  for  printing  out  simple  indexes,  Therefore,  printing  of  high  quality 
will  only  be  a  truly  practical  endeavor  when  the  cost  of  input  of 
references  and  citations,  as  well  as  printing  instructions,  Is 
considerably  reduced,  (See  Paragraph  8.4) 

8*6.2  Automatic  Composing  Machines . Character 

Generator/Printers 

Other  techniques  are  now  being  widely  Investigated  for 
graphic-arts  quality  page  composition;  they  fall  Into  two  major 
categories:  (a)  automatic  composing  machines  and  (b)  character 
generator/printers.  While  the  former  meet  or  exceed  graphic-arts 
requirements,  the  latter  lack  clarity  and  diversity  of  type  fonts.  On 
the  other  hand,  the  latter  operate  at  much  higher  speeds. 
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Summarized  In  Exhibit  E  are  characteristics  of  some  representative 
automatic  composing  machines.  In  Exhibit  F  are  summarized  some 
characteristics  of  representative  character  generator/printers.  These 
charts  list  potentially  applicable  printers,  but  are  not  intended  to  be 
exhaustive.  The  Information  was  obtained  primarily  from  manufacturer- 
provided  literature. 

It  will  be  noted  that  automatic  composing  machines,  though  not 
actuated  by  computer  tapes,  can  be  tape-actuated.  It  may  be  expedient 

to  use  tape  translators  or  to  modify  the  printers'  tape  acceptance 

x 

equipment  and  use  several  machines  In  parallel,  thus  compensating  for 
their  slower  speeds.  This  Is  an  economically  feasible  approach  since 
costs  of  electronic  printing  systems  such  as  described  In  Exhibit  F 
range  from  $250,000  to  $600,000.  On  the  other  hand,  the  cost  of  an 
automatic  typesetting  or  photocomposition  unit  ranges  from  $20,000  to 
$60,000.  Thus  as  many  as  ten  of  the  latter  could  be  used  at  the  same 
cost. 

HIT  has  reported  success  In  programming  Its  computer  for  printout 
on  the  Photon  Photosetter. 3  All  printing  Instructions  —  font, 
spacing,  etc.  —  must  be  included  In  programming. 
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9. 

9.1  iBEgfafllgl 

Described  In  this  section  of  this  report  ere  four  activities  which 
represent  four  distinct  and  different  Implementations  of  the  principles 
of  coordinate  indexing: 

(1)  a  manual,  edge-notched  card  system 
(uses  term  on  Item  flies)  (National 
Conference  on  Social  Welfare) 

(2)  a  mechanized  system  using  a  large- 
scale  general-purpose  computer,  but 
having  no  recourse  to  aids  such  as 
thesauri,  roles,  links,  etc.  (Item 
on  term)  (G.  E.  Evendaie) 

(3)  a  mechanized  system  using  a  relatively 
modest  general  purpose  computer,  and 
using  a  thesaurus,  roles,  and  links. 

(term  on  Item)  (Western  Reserve 
University) 

(4)  a  mechanically  (IBM  1401)  prepared 
index  for  manual  use  (item  on  term) 

(Documentation  Incorporated,  Index  to 
Chemical  Patents)* 

Where  possible,  size  and  cost  are  Indicated.  Descriptions  Include 
details  on  vocabulary  generation  and  control,  on  user  requirements,  and 
on  analysis  or  studies  concurrent  with  or  leading  to  system  Implemen¬ 
tation. 


*  In  addition,  brief  descriptions  of  three  other  studies  conducted  at 
Documentation  Incorporated  are  Included  to  Illustrate  recent  develop-" 
ments . 
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It  I*  worthwhile  noting  that  el though  eech  of  these  groups  uses 
different  equipment,  different  modes  of  Indexing,  different  Indexing 
elds,  end  hes  very  different  cpsts,  eech  feels  that  It  Is  accomplishing 
what  It  Intends  to  do  In  an  economical  and  effective  way.  This  Is  not 
to  say  that  any  of  them  Is  completely  satisfied,  however.  Each  strives 
for  Improvement  and  cost  reduction. 

It  must  be  recognized  that  each  of  these  groups  Is  facing  a  par¬ 
ticular  user  requirement  and  that  It  strives  to  meet  that  particular 
user  requirement. 

Each  system  must  be  Judged,  therefore,  on  Its  own  merits,  and  not 
In  comparison  with  another.  That  each  may  learn  from  the  other  and  may 
profit  by  the  other's  errors  Is,  of  course,  self-evident. 

All  reports  have  been  corrected  and  verified  by  the  system  oper¬ 


ators. 


146 

9,2  National  Conference  Qn.&Qflil  Wgttir.9, 

9.2.1  Background  and  General  InformtlM 

The  Information  given  herein  Is  a  summary  of  the  material 


contained  In  the  two  following  publications  and  Is  supplemented  by 


information  gleaned  during  a  visit  to  the  National  Conference  on  Social 

J 

1  Welfare  In  Columbus  by  the  Documentation  Incorporated  study  group. 

i  Hoffer,  J.  R.  I  nf o r mat  ion  Ret rJey.a.L_|ji Jfl&jli 

i  Welfare  Experience  with  an  Edge-Notched. Mfil- 

matlon  Retrieval  System.  Paper  presented  at  Tenth 
Annual  Meeting  of  the  ADI,  Boston,  Mass.,  Nov.  6, 

I  1961. 

Hoffer,  J.  R.  Manual  for  a  Hand-Sort  Punch-Card 
System  for  Indexing  Social  Welfare  Publications,. 

National  Conference  on  Social  Welfare,  Columbus, 

Ohio.  May  I,  1961. 

The  projects  In  which  the  NCSW  hand-sort  punch-card  system  was  used 

were: 

1.  Indexing  of  Annual  Forum  Proceedings  1955-1959. 

A  pilot  study  to  test  the  system  with  203  manu¬ 
scripts  Included  In  the  NCSW  publications  for  a 
five-year  period.  The  Conference  has  approxi¬ 
mately  5,000  documents  In  printed  form. 

2.  Anatomy  of  the  Twin  Cities  Annual  Forum,  May 
1961.  An  analysis  of  224  meetings  held  during 
the  88th  Annual  Forum  of  the  National  Conference. 

3.  Indexing  of  Conference  Library  Publications. 
Approximately  300  high  utility  reports,  Journal 
articles,  and  manuscripts  were  classified  and 
indexed. 

These  three  projects  are  varied  In  content  and  scope  but  the 
problem  Is  Identical,  i.e.,  Information  retrieval  by  coordinate  indexing. 
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The  same  "descriptors”  were  used  for  ell  three  projects,  but  the 
projects  were  kept  In  separate  files. 

9.2,2  tom Umm 

The  NCSW  punch  card  system  Is  built  on  the  Zatoperk  System 
(Zator  Company,  Cambridge,  Mass.).  The  ctM*ds  are  hand-punched  along 
the  edges.  At  the  top,  under  one  row  of  holes,  are  two  rows  of  the 
alphabet,  the  first  letter  of  the  lower  row  beginning  under  the  fourth 
letter  In  the  top  row.  This  part  of  the  card  Is  called  the  name  cipher, 
becausd  It  Is  used  to  code  authors'  names  and  other  Identifying  Infor¬ 
mation.  Around  the  other  three  edges  of  the  cord  Is  a  single  row  of 
marginal  holes,  numbered  1  to  65.  These  are  used  to  code  the  50 
descriptors  and  12  categories  which  may  be  used. 

Each  card  represents  one  Item,  or  document,  and  Its  Index  terms. 
There  Is  space  on  the  card  for  an  abstract  or  reference.  NCSW  Is 
presently  using  only  bibliographic  references. 

The  cards  are  manually  coded  with  an  ordinary  hand  punch. 

An  "Ice-pick"  or  "needle"  type  tool  Is  used  In  retrieving  infor¬ 
mation.  If  the  pick  Is  inserted  through  the  deck  of  cards  at  a 
particular  term  "hole",  all  Items  Indexed  by  that  term  will  "fall  out" 
of  the  deck.  With  two  picks  a  logical  product  can  be  obtained,  l.e., 
only  cards  which  have  "this  term"  and  "that  term"  will  fall  out. 


In  such  a  system,  the  vocabulary  Is  necessarily  limited, 

In  this  case  there  Is  room  on  the  card  edge  for  only  69  terms.  In  a 
system  like  VfRU's  or  G.  E.‘s  (which  are  described  In  the  next  two 
sections),  the  vocabulary  Is  "limitless'*  and  new  terms  can  be  Introduced 
with  ease.  Those  systems  have  approximately  7,000  terms  and  grow  at 
will.  In  the  NCSW  system,  however,  the  vocabulary  had  to  be  predeter¬ 
mined  and  must  remain  fixed. 

♦ 

The  NCSW  vocabulary  was  generated  only  after  considerable  debate 
and  discussion  among  several  persons  In  Social  Welfare.  The  finally 
generated  vocabulary  has  not  been  entirely  satisfactory,  as  will  be 
explained  In  Paragraph  9-2,5. 

The  Manual,  referenced  above,  goes  Into  considerable  detail  In 
defining  and  limiting  the  scope'of  the  descriptors.  An  example  Is 
given  below  to  Illustrate  the  amount  and  kind  of  Information  that  may 
be  covered  by  each  descriptor. 

20 •  Health  and  medical 

Services  designed  to  prevent 
and  control  diseases  or  to 
promote  health 

casework  In  hospital;  chronic 
Illness;  health  education; 
maternal  care;  medical  assist¬ 
ance;  medical  care;  nursing; 
occupational  therapy;  patients; 
physical  disability;  public  health; 

I 


rehabilitation  of  the  physi¬ 
cally  handicapped;  sanitation; 
social  aspects  of  Illness;  social 
hygiene;  social  work  In  secondary 
settings*  See  also:  #24  -  hospitals, 
residential  treatment  centers;  #27 , 

#36,  and  #37. 

There  Is  also  an  Indexing  and  searching  aid  which  lists,  In 
alphabetic  order,  the  terms  actually  used  In  documents,  and  indicates  the 
descriptor  by  which  the  term  Is  coded.  An  example  Is  given  below. 

health  and  welfare  councils 
See  #8 

health  education 
See  #20 
See  also  #46 

health  insurance 
See  #33 

heart  disease 

See  #20 

Hinduism 

See  #40 

The  total  list  of  descriptors  and  categories  Is  shown  on  the  code 


sheet  reproduced  on  the  following  page. 
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EXHIBIT  A 

„  '  NCSW 

CODE  SHEET  9/15/61 

Coded  by: 

Information  Retrieval  -  NCSW  Publication* 


1, 

Name  of  Author 

2. 

Title  of  Article 

3. 

Publisher 

4. 

Year  and  Source 

Descriptors  and  Categories 

Descriptor*  (Circle  all  appropriate) 

1. 

Adm,  &t  Org, 

21. 

Historic 

41.  fioc,  policy  St  action 

2. 

Adults 

22» 

Human  growth  St  behav. 

42.  Social  wk.  practice 

3. 

Casework  St  guidance 

23. 

Information  retrieval 

43.  Societal 

4. 

Children 

24, 

Institutional,  and  bull- 

44.  Socio- cult,  factors 

5, 

City  St  urban 

ding  cent,  programs 

45.  State  St  reg, 

6. 

Communications 

25. 

International 

46.  Teach,  ^learning 

7. 

Com.  developmt. 

26. 

Leisure  St  recreational 

47.  Yolun.  agency 

8. 

Com.  org. 

27. 

Mental  health  h  mental 

48.  Volunteers 

9. 

Conferencing 

illness 

49.  Youth 

10. 

Corrections 

28. 

Minority  groups 

50.  Omnibus,  no  or 

11. 

Dependency 

29. 

National 

other 

12. 

Discrimination 

30. 

Neighborhood 

13, 

Economic  factors 

31. 

Personnel 

Categories  (circle  only 

14, 

Educ.  -academic 

32. 

Philosophic 

if  major  importance) 

15, 

Educ.  -informal 

33. 

Preventive  St  protective 

16. 

Familial  and  sexual 

34. 

Private  service  St  practice 

51,  Values 

17. 

Financing 

35. 

Professions  St  relat.  fields 

52.  Knowledge 

18. 

Governmental 

36. 

Psychiatric  St  psychol, 

53.  Purpose* 

19. 

Group  work 

37. 

Rehabilitative  St  mult,  serv, 

54.  Methods 

20. 

Health  St  medical 

38. 

Research  St  studies 

55.  Auspices 

39, 

Rural  St  agricultural 

56,  Problems 

40, 

Sectarian 

57,  Provision  Sc 

management 

58,  Service* 

59,  Spec,  Prob.  Group* 

60,  Age  group* 

61,  Settings 

62,  Geog,  boundaries 


1/  See  Manual  for  a  Hand-Sort  Punch- Card  System,  Appendix  3  for  Definition* 


REPRINTED  WITH  PERMISSION 


miMii'i 


illlllllllllllM'llj 


151 

9.2.4  XI Jjfli  w4 

The  average  time  to  process  a  document  of  approximately 
3,500  words  and  to  prepare  the  cards  Is  estimated  at  27-30  minutes 
(coding  *  15  minutes)  at  an  approximate  cost  of  $1.00  per  document. 

9.2.5  Valuation  of  System 

The  three  projects  so  far,  according  to  Mr.  Hoffer, 
Indicate  that  a  hand-punch  system  for  Indexing  social  welfare  publi¬ 
cations  with  a  limited  number  of  descriptors  has  value  for  retrieving 
Information  In  social  welfare.  This  was  stated  In  the  ADI  paper 
referenced  above,  as  was  the  following  enumeration  of  limitations  and 
difficulties  and  areas  requiring  further  study. 

"The  original  list  of  'descriptors'  was  not 
entirely  adequate.  (Some  revisions  were  made 
during  the  projects  and  have  been  made  since.) 

The  list  needs  further  testing  especially  with 
documents  published  outside  the  United  States, 
and  for  high  frequency  major  'descriptors'  and 
low  frequency  minor  'descriptors'. 

"Limiting  the  number  of  descriptors  to  50  may  be 
too  restrictive  for  a  direct  coding  system.  (In 
similar  systems  In  some  scientific  and  technical 
fields  from  250  to  300  descriptors  are  used.)  It 
results  In  selection  of  a  large  number  of  cards 
on  the  firstsort,  with  resulting  need  for 
additional  sorts  to  locate  sources  of  data  on 
topics  of  limited  scope. 

'The  'categories'  were  not  adequately  tested  to 
determine  whether  they  were  comprehensive  and 
mutually  exclusive,  whether  they  were  valid 
selections  and  appropriately  stated,  and  whether 
they  should  be  coded,  (Question  might  be  raised 
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whether  holes  assigned  to  the  categories  might 
be  better  used  by  enlarging  the  list  of 
•descriptors1,) 

"The  possibilities  of  using  a  more  complex  system 
than  'direct  coding1  might  appropriately  be 
explored.  If  seriously  considered,  both  the 
glossary  or  dictionary  of  terms  and  the  type  of 
card  used  would  have  to  be  re-examined, 

"How  detailed  should  the  Indexing  become,  i.e., 
deep  Indexing." 

Hr,  Hoffer  also  pointed  out  In  that  paper  that  this  system 
probably  has  greatest  value  for  a  general  or  generic  library  or 
collection  such  as  social  planning  and  public  welfare.  It  has  not  been 
determined  whether  It  would  have  high  value  In  such  specialized  areas 
as  child  welfare,  corrections,  psychiatric  social  work  or  other 
specializations  in  which  the  user  may  wish  greater  refinement  or 
technical  analysis. 
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9.3  G-E.  (Evendale)  Flight  Propulsion  Division.  Technical  Information 

Center. 

9.3.1  Background  and  General  Information 

Several  papers  have  been  published  which  describe  the 
activities  and  progress  at  the  G.  E.  Technical  Information  Center.  The 
Information  given  herein  Is  a  summary  of  material  contained  In  such 
papers,  especially  the  following  two,  and  is  supplemented  by  Information 
obtained  during  a  visit  to  Evendale  by  the  Documentation  Incorporated 
study  group. 

Dennis,  B.  K.  Dissemination  Via  the  Automated. 

Technical  Information  Center.  Publication  property 
of  American  Chemical  Society,  presented  at  the 
140th  ACS  National  Meeting,  Division  of  Chemical 
Literature,  September  4,  1961. 

Dennis,  B.  K.  Genera)  Electric's  Automatic 
Information  Retrieval  System.  Presented  to  Special 
Libraries  Association,  Battelle  Memorial  Institute, 

Columbus,  Ohio,  April  4,  1961. 

The  G.  E.  Technical  Information  Center  developed  from  a  passive- 
type  technical  library  Into  an  active  Information  center.  This  tran¬ 
sition  necessitated  not  only  that  there  be  means  of  rapid  access  to  a 
large  file  of  scientific  and  technical  documents,  but  also  that  there 
be  methods  of  rapidly  and  effectively  disseminating  information  on  both 
a  regular  and  demand  basis. 

The  Center,  therefore,  established  a  manual  Uniterm  coordinate 
Index.  By  1957,  however,  the  index  encompassed  over  20,000  documents 
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Access 'to  the  book  collection  ot  the  library  Is  stilt  by  way  of  a 
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convent  Ion# I  card  catalog . 

The  Information  In  the  mechanized  system  consists  of  technical 
reports  and  memoranda  generated  Internally  and  also  that  obtained  from 
ASTIA,  of  technical  society  papers,  journal  and  trade  press  articles, 
foreign  and  U,  $.  patents,  translations,  and  miscellaneous  scientific 
and  technical  Information. 

The  system  Is  based  on  the  use  of  Uniterm*  and  document  file 
numbers.  Each  document  In  the  system  may  have  20  or  more  words  to 
describe  It.  In  addition,  a  concise  (30-50  words)  descriptive  abstract 
Is  prepared  for  each  document  In  the  system. 

The  Index  Is  Inverted,  l.e.,  Is  an  Item-on-term  system. 


The  document  Index  and  abstract  are  placed  on  magnetic  tapes  for 
use  In  electronic  computing  equipment.  There  are  approximately  60,000 
document  abstracts  recorded  on  six  tapes;  there  are  over  7,000  words 
describing  the  documents;  and  there  are  more  than  900,000  access  points 
on  the  system's  master  tape.  Over  1,000  new  documents  are  abstracted 
and  placed  on  the  tape  each  month, 

Originally,  the  retrieval  system  was  programmed  for  the  IBM  704. 
During  early  1961,  the  I.R.  programs  were  rewritten  for  the  IBM  7090. The 
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7090  Is  used  In  conjunction  with  two  1401 's.  It  should  be  noted  that 
the  principal  criteria  for  choice  of  these  computers  were  that  they 
were  located  at  the  Flight  Propulsion  Division  and  that  computer  time 
was  available  to  the  Technical  Information  Center. 

As  part  of  Its  dissemination  program,  the  Center  publishes  a 
weekly  announcement  bul’etln,  TIPS.  The  same  punched  cards  which  are 
used  to  update  the  Center's  automated  retrieval  system  are  also  used  to 
produce  the  multi  11th  masters  from  which  TIPS  Is  reproduced.  Addition¬ 
ally,  the  punched  cards  are  used  for  the  masters  for  the  Center's 
conventional  catalog  cards  and  for  document  loan  cards.  At  this  time, 
documents  are  announced  In  TIPS  In  twelve  broad  subject  categories, 
with  a  document  assigned  to  only  one.  The  use  of  the  IBM  1401  with  a 
keyword- I n-context  program  for  providing  an  Index  for  the  announced 
material  Is  being  considered. 

9.3.2  Machine  Programs 

The  machine  programs  of  the  Automatic  Information  Retrieval 
System  are  set  up  in  two  basic  parts.  First  Is  the  coordination  (logical 
product)  part  of  the  search  system.  Here,  key  words  selected  by  the 
searcher  are  located  on  tape  and  their  access  numbers  compared.  As 
many  as  1,200  machine  questions  can  be  handled  on  one  run. 

The  Part  I  magnetic  tape  file  now  contains  over  5,000  key  words 
(majors)  describing  more  than  60,000  documents.  The  average  depth  of 


md  er Jb  ',wk  ■  ye t  on  magnetic  t«po.  The  total  Inverted  file  now 


about  1,200  foot  (one- half  root)  of  high  density  tape,  Tho  Input  tap# 
contain#  program  Instructions,  additions  to  tha  Unltorm  fll#  and/or  the 
search  questions,  In  that  order.  Search  ind  fll#  maintenance  of  the 
Uniterm  file  may  be  performed  concurrently.  However,  th#  file  updating 
Is  scheduled  semimonthly  and  searches  ar#  performed  upon  request*  When- 

t 

ever  an  updating  run  is  being  made,  a  new  Part  I  magnetic  tape  Is 
produced, 

Part  II  of  the  Automatic  Information  Retrieval  System  is  an 
abstract  look-up  program.  Ten  thousand  abstracts  and  their  citations 
are  filed  on  one  reel  of  magnetic  tape.  Thus,  for  the  file  of  nearly 
60,000  documents,  there  are  six  tapes  of  abstracts.  During  this  part  of 
the  machine  run,  abstracts  Identified  by  access  numbers  found  during  the 
Pert  I  search  are  located  and  transferred  to  an  output  tape.  The  Part  II 
program  searches  two  abstract  tapes  simultaneously  for  abstracts,  and 
automatically  progresses  to  another  abstract  tap#  when  finished  with  a 
tape.  The  program  has  the  ability  to  edit  the  Part  I  results  as  to 
groups  (searches)  or  ranges  of  access  numbers,  at  the  searcher’s 


discretion.  A  complete  Part  I  and  Part  II  search  results  In  an  output 
tape  which  contains  access  numbers,  abstracts,  and  customer  Identification. 
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Information,  Content*  of  the  output  tape  are  read  end  printed  by  a  1401 
computer. 

Output  from  Part  It  (abstracts)  Is  optional,  If  only  a  printed  list 
of  access  numbers  will  suffice,  the  results  In  Part  I  may  be  transferred 
to  an  output  tape,  hence  to  be  printed  off-line  by  the  1401.  Or  the 
output  from  Part  I  may  be  used  as  Input  to  Part  M*  Another  option 
available  In  Part  I  Is  a  "hold",  whereby  results  of  the  Part  I  run  may  be 
printed  out  and  a  summary  tape  preserved  until  the  printout  has  been 
reviewed.  Thus,  adjustment  can  be  made  before  going  Into  Part  II. 

Although  the  Automatic  Information  Retrieval  System  Is,  to  a  great 
extent,  a  high-speed  mechanized  version  of  a  manual  Uniterm  coordinate 
Index, It  Is  at  the  same  time  somewhat  more  versatile  than  might  be 
Implied.  For  example,  In  Part  I,  there  Is  the  unlimited  ability  to 
relate  search  questions,  thus  providing  an  effective  "or"  and  serves  to 
eliminate  duplicate  accession  numbers.  To  avoid  an  unreasonably  large 
output  of  abstracts  on  some  particular  question,  the  Part  II  abstract 
printout  can  be  limited.  An  Important  option  Is  the  ability  to 
exercise  an  access-number  high- low  limits  and  range  control.  Since 
there  Is  a  high  correlation  between  access  number  and  date  of  entry 
Into  the  file,  this  control  gives  the  ability  to  vary  chronologically 
the  output  of  the  search.  Also,  It  enables  providing  a  current-awareness 
service.  In  a  two  (or  more)  term  question,  should  one  of  the  Unlterms 
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dause  the  net  coordination  to  go  to  zero,  a  "no-blank  sort"  feature  Is 
used  to  avoid  this  condition  by  giving  a  printout  on  the  remaining 
terms  In  the  question. 

9*3*3  Indexing  and  duality  Control 

Because  the  Index  terms  are  punched  In  three's  on  the  IBM 
cards,  terms  are  restricted  to  18  characters,  Therefore,  the  vocabulary 
Includes  practically  no  "bound"  terms.  Indexers  (technical  abstracters) 
are  generally  familiar  with  the  vocabulary  of  the  system  and,  for  the 
most  part,  use  only  terms  that  actually  appear  In  the  document.  Synony¬ 
mity  and  generic  levels  are  taken  Into  account  less  by  the  Indexers 
than  by  the  searchers,  who  formulate  the  questions  after  discussion 
with  the  client  and  with  the  vocabulary  of  the  system  in  mind. 

It  should  be  noted  that  In  searching,  there  is  no  intermediate 
code  look-up  required,  because  the  terms  are  entered  on  the  tapes 
directly  from  the  alphanumeric  punched  cards. 

Work  Is  In  progress  In  editing  the  existing  vocabulary.  The 
possible  generation  of  a  thesaurus  Is  being  Investigated.  There  Is  a 
hope  that  a  thesaurus,  If  necessary,  may  be  kept  quite  simple;  this 
seems  likely  In  view  of  the  adequacy  of  the  present  methods,  l.e,, 
relying  on  the  searcher's  Judgment  and  knowledge  of  the  system.  The 
A.I.Ch.E.  thesaurus,  as  well  as  others,  Is  being  studied  as  a  possible 
model. 


The  machine  system  Itself  Is  used  for  system  efficiency.  For 
example,  a  useful  tool  for  the  literature  searcher  Is  the  machine- 
tabulated  alphabetical  list  showing  frequency  of  use  (posting)  of  the 
term, 

A  key  word  that  Is  not  In  the  file  Will  rot  be  accepted  as  e  new 
term  unless  a  prescribed  procedure  Is  followed,  All  such  rejected  terms 
are  noted  on-line  during  the  Uniterm  file  update.  This  serves  to 
control  the  growth  of  the  vocabulary. 

9.3.4  Com 

The  Center  has  not  made  a  complete  study  of  Input  costs, 
l.e.,  Indexing,  abstracting,  etc.,  and  Is  not  yet  willing  to  assign  a 
"cost  per  Item"  figure. 

It  should  be  noted  that  the  Center  actually  sells  Its  services; 
most  of  Its  customers  are  In  G.E.  Sales  are  negotiated,  however,  and 
the  customer  must  be  convinced  that  the  results  are  worth  the  cost.  For 
typical  searches,  and  where  the  customer  Is  willing  to  wait  for  a  few 
days  until  his  question  can  be  run  on  the  machine  with  others,  a  flat- 
rate  price  of  $75.00  has  been  assigned. 

In  the  report,  "General  Electric's  Automatic  Information  Retrieval 
System",  referenced  above,  Hr.  Dennis  detailed  some  of  the  costs  of 
searching;  he  has  updated  the  figures  for  this  report. 


160 


"Every  minute  spent  by  the  IBM  7090  grinding  out  a 
literature  search  costs  the  Technical  Information 
Center  $6.00.  However,  when  one  considers  Just 
what  the  machine  accomplishes  during  that  minute 
this  cost  shrinks  Into  proper  relationship  ...  If 
we  assume  that  about  eight  key  word  questions  will 
be  required  to  describe  one  customer's  question  on 
the  machine  and  If  we  search  only  and  do  not  update 
the  file  while  searching,  In  about  three  minutes 
machine  time  we  can  search  not  only  these  eight  key 
word  questions,  but  1,200  such  combinations.  At 
the  rate  of  eight  key  word  questions  per  customer 
question  and  assuming  an  average  of  100  searches 
per  run,  we  can  operate  with  at  least  twelve 
customers  on  a  full  machine  run  during  the  three 
minute  period  of  time.  If  we  assume  further  that 
the  literature  searcher  will  require  about  30 
minutes  per  customer  to  set  up  the  machine  questions 
and  If  we  assign  a  typical  engineering  rate  of  $10.00 
per  hour  to  the  searcher  then  we  find  that  for  a  total 
labor  plus  machine  cost  of  between  $80.00  and  $90.00 
we  can  conduct  12  machine  literature  searches 
simultaneously.  It  Is  probably  more  likely  that  In  a 
manual  system  with  a  file  of  comparable  size  the 
searcher  would  not  make  so  many  coordinations  or  so 
accurately  as  the  machine.  In  fact,  one  would  probably 
expect  to  get  less  than  a  third  of  the  coverage  at 
about  three  times  the  cost." 

9.3.5  Requirements  and/or  Potential  Refinements 

In  the  same  paper  referenced  above,  Mr.  Dennis  enumerated^ 

the  following  plans  "to  Improve  the  efficiency  and  effectiveness  of 

retrieval  while  reducing  over-all  system  cost." 

"Machine  programming  is  under  way  which  will 
significantly  reduce  operating  costs.  After  the 
current  phase  of  machine  program  development  is 
complete  we  expect  to  extend  our  current  system 
logic  to  include  logical  sum  and  difference.  And 
of  course  we  will  always  remain  alert  to  possibil¬ 
ities  to  utilize  the  capabilities  of  our  machine 
more  fully  and  more  efficiently.  In  addition  to 
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machine  program  development  we  have  a  great  deal 
of  basic  vocabulary  work  to  be  done...  First  of 
all  *  we  are  very  much  in  need  of  a  thesaurus  for 
our  retrieval  system.  Although  our  Initial  work 
oh  the  thesaurus  Is  being  performed  with  the 
searcher  primarily  In  mind,  It  is  entirely  con* 
celvable  that  some  day  we  will  build  It  Into  our 
machine  system.  Secondly,  there  are  problems 
which  would  be  greatly  helped  by  a  post-also 
routine  In  our  machine  system,  We  see  this  tech¬ 
nique  being  used  In  only  very  special  cases  since 
we  wish  to  avoid  adding  unnecessarily  to  the 
length  of  our  Inverted  file. 
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'Ve  believe  that  an  automatic  Post  to  —  Instead 
of  —  routing  by  the  machine  will  eventually  go  a 
long  way  toward  bringing  the  synonym  problem  under 
control.  We  have  already  started  this  refinement 
utilizing  a  mechanical  editing  process  prior  to 
updating  -f  the  unlfHe*  We  giving  serious 
consideration  to  an  optional  see  also  program  for 
the  machine.  This  refinement  appears  to  bo  entire¬ 
ly  feasible  and  will  not  result  In  a  great  Increase 
In  the  machine  operating  cost.  It  appears  that  the 
Inclusion  of  a  document's  uni  terms  along  with  Its 
conventional  abstracts  as  machine  output  would  aid 
the  literature  searcher  considerably.  Also  since 
uni  terms  are  selected  from  the  entire  document 
rather  than  just  the  abstracts  It  would  help  the 
searcher  and  his  client  to  more  fully  evaluate  the 
potential  value  of  documents  identified  during  the 
search.  It  was  noted  earlier  In  this  paper  that  one 
of  the  features  of  our  present  system  Is  a  'no-blank 
sort1  refinement.  Unfortunately  the  'no-blank  sort' 

Is  on  an  alphabetical  basis.  We  wish  to  modify  this 
feature  and  have  an  optional  priority  control.  It 
appears  highly  desirable  to  bulJ^ji_ajJt-omat4e — 
cumulative  system  experience  aria  lysis  feature  Into 
our  present  machine  system.  The  machine  could  tell 
us  what  key  words  have  been  used  in  questions,  their 
frequency  of  use,  what  abstracts  have  been  called  for, 
their  frequency,  etc.  Such  an  automatic  continuing 
**p£rlsft£e  analyst*  mould  **  M If Ful  Itt* 
development." 
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During  the  visit  to  the  Center  It  was  noted  that  records  are  kept 
of  output  of  questions.  On  this  record,  the  technical  evaluator  who 
has  scanned  the  material  of  the  output  has  Indicated  whether  the 
document  was  pertinent  to  the  request,  somewhat  pertinent,  or  not  at 
all  pertinent.  When  asked  whether  any  count  had  been  made  of  the  non- 
pertinent  output  to  determine  system  effectiveness,  Mr.  Dennis  replied 
that  this  had  not  been  done  because  It  would  In  fact  not  be  a  test  of 
the  system.  He  pointed  out  that  a  "false  drop"  Is  very  hard  to  define. 
For  one  thing,  he  says,  whether  or  not  the  material  answers  the  query 
Is  subject  to  the  opinion  of  (1)  the  man  who  Indexes,  (2)  the  searcher 
who  formulates  the  question,  (3)  the  technical  evaluator,  and  (4)  the 
user.  A  machine,  he  points  out .  may  correctly  retrieve  nonpertinent 
Information.  It  may  for  example  retrieve  an  Item  which  suits  the 
search  In  all  respects  except,  say,  temperature  range  or  geographical 
location.  It  may  only  be  the  user  who  finally  makes  this  decision. 

The  G.  E.  system  makes  no  use  of  roles  or  links.  Mr.  Dennis 
suggested  that  the  problems  that  may  be  solved  by  their  use  may  just 
as  easily  be  solved  by  Introducing  a  third  term  In  the  search  question. 
He  feels  that  the  addition  of  roles  and  links  may  unnecessarily 
complicate  the  system,  and  that  perhaps  the  cost  would  be  unjustifiably 


Increased. 
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9.4  Western  Reserve  University  Center  for  Documentation  and 
Communication  Research . 

9.4.1  Introduction 

Although  WRU  Is  also  engaged  In  work  with  the 
American  Diabetes  Association,  Communicable  Disease  Center  (U.  S.  Public 
Health  Service),  U.  S.  Office  of  Education,  and  others,  this  report 
describes  only  the  American  Society  for  Metals  Documentation  Service, 
since  this  Is  a  fully  operational  Information  searching  system. 

Several  reports  have  been  prepared  by  the  VJUJ  group  for  the 
National  Science  Foundation  (NSF-G- 10338)  which  describe  the  ASM 
operation  in  some  detail: 

Test  Program  for  Evaluating  Procedures  for  the  Exploitation 
of  Literature  of  Interest  to  Meta^urq*s^s 

Part  1.  Development  of  an  Operational  Machine 
Searching  Service. 

Part  II.  Acquisition  of  Documents  for  Machine 
Searching . 

Part  III.  Analysis  and  Quality  Control 

Part  IV.  A  Cost  Analysis  of  Abstract  Preparation 

and  Processing  for  an  Operational  Service 

Part  V.  The' Semantic  Code  Today 
The  Information  In  this  report  is  a  summary  of  the  material  con¬ 
tained  In  those  five  reports  and  In  a  report  titled  "A  Case  History  for 
Test  Program  for  Evaluating  Procedures  for  the  Exploitation  of 
Literature  of  Interest  to  Metallurgists. 


Although  the  following  report 
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Is  almost  completely  extracted  from  this  literature,  It  Is  supplemented 
by  Information  obtained  during  a  visit  to  WRU  by  the  Documentation 
Incorporated  study  group.  However,  since  the  WRU-ASM  system  is  being 
subjected  to  careful  study  and  analysis  both  by  outside  observers  and 
by  the  users  and  operators  of  the  services,  this  report  makes  no 
attempt  to  evaluate  the  system. 

As  mentioned  elsewhere  in  this  study,  there  is  considerable 
disagreement  as  to  the  efficiency  and  economy  of  term-on-ltem  files. 

The  most  common  criticism  of  term-on-ltem  files  Is  that  the  entire 
store  must  be  searched,  whereas  In  Item-on-term  systems  only  the  postings 
on  the  question  terms  need  be  considered.  Inasmuch  as  WRU  processes  Its 
files  on  a  GE-225  which  can  conduct  simultaneous  searches,  and  queries 
can  therefore  be  ’’batched",  the  objection  to  searching  the  entire  file 
is  somewhat  overridden.  The  GE-225  can  search  as  many  as  99  questions 
simultaneously;  WRU  generally  searches  20-30  questions  simultaneously. 

WRU's  ASH  system  Is  of  special  Interest  to  the  users  of 
coordinate  indexing  In  that  it  utilizes  several  controversial  aids:  role 
Indicators;  punctuation,  which  may  also  be  called  links;  and  a  semantic 
code,  which  is  analogous  to  a  thesaurus.  These  devices  have  been 
subject  to  considerable  controversy.  Some  say  roles,  links,  and/or 
thesauri  are  entirely  unnecessary;  others  would  have  only  limited  use 
of  one  or  two  of  these  devices;  a  third  group  claims  that  although  these 


165 


aids  may  be  useful,  they  are  too  complicated,  i.e.,  too  costly. 

it  should  be  pointed  out  that,  as  far  as  could  be  determined  by 
the  visit  made  to  the  Center,  the  WRU  group  is  not  enamoured  of  roles, 
links,  the  semantic  code,  nor  the  GE-225,  per  se .  These  have,  however, 
been  deemed  valuable  to  tne  ASM  system.  Conscientious  revision  and 
monitoring  of  codes  and  procedures  are  continuously  performed  in  an 
attempt  to  best  meet  users'  needs.  There  is  much  self-criticism  and 
very  little,  if  any,  insistence  that  the  ASM  system  could  or  snould  be 
a  model  for  any  other.  It  was  remarked,  for  example,  that  in  its  study 
for  the  Office  of  Education,  the  group  was  encountering  somewhat 
different  problems  in  vocabulary  and  user  requirements.  (It  was  pointed 
out,  however,  that  the  basic  principles  do  seem  to  hold.) 

It  should  also  be  pointed  out  that  the  WRU-ASM  system  may  seem 
somewhat,  perhaps  unnecessarily,  complicated  to  those  who  are  more 
familiar  with  item-on-term  systems  than  with  term-on- item  systems. 

Both  for  this  reason  and"  because  of  the  nature  of  their  subject  matter, 
it  Is  not  to  be  expected  that  their  techniques  should  apply  to  all 
systems.  The  V/RU  group,  however,  believes  that  because  a  term-on- item 
system  can  be  converted  to  an  item-on-term  system,  or  vice  versa,  many 
of  the  principles  which  have  evolved  from  its  work  may  have  general 
appl icabi 1 ity. 


9.4,2  Background  and  General  l.nfprniUfln 

Sine*  1955  the  American  Society  for  Metal*  has  provided 

financial  as* 1  stance  to  the  Center  for  processing  the  metallurgical 
literature.  The  program  has  consisted  of  two  parts*  (1)  preparation  of 
short  English  abstracts (cal led  "conventional") for  publication  In  the 
Review  of  Metal  Literature  and  (2)  experiment*!  and  pt lot  studies  In  the 
preparation  of  encoded  abstracts (cal led  "telegraphic") for  use  in 
searching  the  literature  by  machine  methods.  Although  these  two  parts 
have  been  conducted  concurrently,  the  first  was  an  operational  service 

when  the  Center  assumed  responsibility. 

From  1955  to  1957 ; work  wes  conducted  on  a  semantic  code  dictionary 
(a  machine-coded  dictionary),  and  the  groundwork  was  laid  to  establish 
procedures  for  transferring  recorded  Information  from  English-language 
abstracts  to  machine-encoded  symbols.  Methods  were  developed  for  train¬ 
ing  abstracters  In  the  preparation  of  telegraphic  abstracts,  for 
transferring  this  Information  to  machines,  and  for  testing  and  evaluating 
the  results  of  each  step. 

During  1958  the  Center  completed  12,000  conventional  abstracts  for 
publication  In  the  Review  of  Metal  Literature;  telegraphic  abstracts 
were  prepared  for  4,500  of  these.  In  1959,  of  the  12,000  conventional 
abstracts  produced  for  the  Review,  telegraphic  abstracts  were  prepared 
for  7,500.  In  addition,  during  the  year,  It  was  decided  to  conduct 


some  experimental  searches.  A  number  of  Interested  users  In  Government 
and  Industry  agreed  to  submit  sample  questions  for  continuous  searching 
of  the  current  literature  over  a  given  period  of  time.  As  a  result  of 
the  apparent  feasibility  of  the  system,  ASH  Inaugurated  the  Metals 
Documentation  Service  In  January  I960,  with  12,000  conventional  and 
12,000  telegraphic  abstracts  scheduled  for  production  In  I960. 

During  the  period  of  testing  and  evaluating,  It  became  Increasingly 
evident  that  there  are  many  scientific  areas  closely  allied  to  metallurgy 
(e.g.,  physics,  Inorganic  chemistry,  geology)  which  had  not  previously 
been  Included,  Application  was  made  to  the  National  Science  Foundation 
for  funds  to  support  a  test  and  evaluation  of  a  much  expanded  activity  . 

A  grant  was  received  In  December  1959  for  this  purpose.  Plans  were  then 
made  to  process  an  additional  22,000  -  23,000  conventional  and  tele¬ 
graphic  abstracts,  which  would  increase  the  I960  output  to  about  35,000. 

By  the  end  of  December  I960  conventional  and  telegraphic  abstracts 
had  been  prepared  for  about  39,000  articles,  of  which  34,000  were 
completely  processed  and  put  on  tape  ready  for  searching.  The  remaining  .. 
5,000  were  processed  early  In  1961. 

9.4.3  , feats 

In  Its  analysis  of  Input  costs  gleaned  from  one  year  (I960) 
of  operation  of  the  ASM  system,  WRU  has  calculated  a  cost  of  $6.50  per 
Item.  This  Includes  15%  overhead,  4%  employee  benefits  (on  personnel 
costs),  cost  of  acquisition,  abstracting,  coding,  punching,  equipment, 


supplies,  etc*  (As  mentioned  earlier,  a  report  for  NSF  on  output  costs 
Is  In  process.) 

The  abstracting  staff  consisted  of  two  full-time  and  50  part-time 
members  during  the  period  covered  by  the  report.  Part-time  abstracters 
are  paid  by  the  piece.  The  following  cost  figures  were  given  for 
abstract  preparation:  an  average  of  $2.0395  to  prepare  both  a 
conventional  and  a  telegraphic  abstract  from  a  full  article;  $1.3265 
to  prepare  a  telegraphic  from  a  full-length  article  when  the  author 
abstract  Is  used  for  the  conventional;  and  $0.8017  to  prepare  a  tele¬ 
graphic  from  an  abstract  In  an  abstract  Journal.  The  combined  average 
Is  $1.5326  per  abstract. 

A  summary  of  the  Input  costs  Is  reproduced  on  the  following  page. 
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9.4.4  Telegraphic  Abstracts 

The  telegraphic  abstract  is  one  of  the  essential  portions 
of  the  system.  It  is  prepared  in  addition  to  a  conventional  abstract 
and  is  an  "index"  to  be  read,  ultimately,  by  a  machine. 

A  telegraphic  abstract  is  made  up  of  (l)  significant  words 
selected  from  the  articles,  (2)  code  symbols  called  role  indicators 
which  fit  the  selected  words  into  context,  and  (3)  punctuation  symbols 
which  separate  and  group  the  words  and  role  indicators  into  various 
units  in  somewhat  the  same  fashion  as  conventional  punctuation  does. 

V/RU  has  made  the  following  assumptions  in  formulating  their  index¬ 
ing  procedures  for  a  machine  searching  system: 

1.  The  names  of  materials,  their  properties,  processes  which 
they  undergo  and  the  conditions  of  these  processes  can  be  used  as 
index  terms. 

2.  Certain  roles  which  the  words  designating  materials, 
properties,  processes  and  conditions  can  play  in  the  context  of  the 
subject  matter  are  important  in  indexing.  The  devices  for  designating 
the  role  of  a  word  in  context  are  the  role  indicators.  (This  device 
is  used  to  avoid  the  syntactic  problems  in  the  popular  example:  water 
cooling.  By  affixing  role  indicators  It  can  be  determined  whether  (l) 
the  water  is  used  for  cooling,  or  If  (2)  the  water  is  cooled.) 


3.  It  If  useful  for  an  Index  to  show  how  certain  words  are 
lirdUpatl  together,  The  devices  used  to  group  units  of  Information  are 
called  punctuation  symbols.  (This  device  Is  used  to  avoid  the  problems 
found  In  a  set  of  terms  tike  "gold",  "silver",  '•watch",  and  "ring". 

Does  the  document  discuss  gold  watches  and  silver  rings,  or  silver 
watches  and  gold  rings?)  Some  systems  attach  a  letter  or  number  to  the 
Item  codes  as  links,  e.g.,  gold  -  I1706A;  watch  -  1 1706A;  and  silver  • 
M7068;  ring  »  1I706B.  The  WRU  Indexer  uses  a  double  dot  (..)  to  mark 
the  beginning  of  each  associated  set  of  words,  e.g.,  ..gold. watch., 
silver. ring.  On  the  magnetic  tape  these  punctuation  symbols  are  numeric 
codes  to  Indicate  the  level  of  grouping,  e.g.,  in  "8"  may  appear  at  the 
beginning  of  each  abstract,  a  "7"  may  appear  at  the  beginning  of  large 
groupings  within  the  abstract,  a  "6"  might  Indicate  a  smaller  grouping 
within  the  larger,  etc. 

For  an  Illustration  of  this,  see  Paragraph  3.6,  Example  of  Input, 

4.  The  Index  terms  can  be  encoded  Into  an  artificial 
language  which  will  act  as  a  thesaurus  to  show  the  "areas  of  meaning 
which  various  words  partake  of",  so  that  In  using  the  Index,  If  the 
wcrds  of  the  question  mean  the  seme  thing  as  the  words  in  the  Index,  the 
document  will  be  found.  The  device  used  to  achieve  a  thesaurus  function 
for  the  words  selected  Is  called  the  semantic  code.  (This  will  be 
discussed  at  greater  length  In  Paragraph  3.k.) 


WIIIH  IN  I  llll  Itl  II  IIIH  nil  II  IIHI  INI  lllll  II  I  llll  Ill  III!  IN  I!  Ill  IHI  ll  II  H  Mill  I  1 1  HI  llllllllll  llWtlllltll  HlllllllHIf 


The  figure  below  shows  a  portion  of  a  telegraphic  abstract, 


, ,KEJ, 
.KUJ, 
,KUJ, 
..KAM, 


2.  ROD 
4.  ALLOY 
6.  AL 
8.  ANNEALING 


The  double  dots  Indicate  the  grouping  of  the  terms  rod,  alloy*  and 
aluminum.  The  comma  Indicates  that  KEJ  Is  one  unit  of  Information  and 
that  rod  ts  another  which  Is  associated  with  It.  The  single  dot 
Indicates  that  KUJ  and  alloy  (which  are  associated  with  each  other  by 
the  comma)  are  associated  with  the  other  terms  between  the  double  dots. 

KEJ  Is  a  role  Indicator,  It  shows  that  the  word  "rod"  Is  the  name 
of  a  material  which  Is  acted  on  by  a  process.  When  the  other  role 
Indicators  are  translated,  the  Information  In  the  sample  Is  as  follows: 
a  Is  shown  as  being  composed  of  an  alloy  whose  major  constituent  Is 
aluminum  and  this  aluminum  allov  rod  Is  being  subjected  to  the  process 


of  annealing. 


9.4.5 


Each  term  (word)  In  the  telegraphic  abstract  Is  coded  Into 


the  semantic  code. 


*  Mortimer  Taube  of  Documentation  Incorporated  has  prepared  a  paper 
which  points  out  that  the  WRU  semantic  code  and  Hence  the  searching 
system  based  upon  It  can  be  evaluated  by  comparing  the  generic  relations 
embodied  In  the  code  with  generic  relations  found  In  the  literature 
being  Indexed  or  coded.  The  paper,  "A  Note  on  the  Evaluation  of  the  WRU 
Semantic  Code  as  an  Example  of  Generic  Coding",  will  be  published  In  the 
April  1962  Issue  of  American  Document atlm. 


which  may  ba  Interpreted  as  "a  crystalline  form  composed  6f  carbon  and 
characterized  by  hardness,"  {The  codes  have  been  arbitrarily  limited  to 
four  factors,)  Any  one  of  the  factors  may  be  searched.  That  Is,  the 
Items  Indexed  by  "diamond"  would  be  retrieved  whether  the  question 
colled  for  that  specifically  or  for  "things  composed  of  carbon", 

"hard  things",  or  "crystalline  forms" 

Actually,  CEABr*CWRSi#PYPRj1028  Is  not  the  complete  code  for  diamond, 
since  under  the  principles  of  the  semantic  code  this  would  be  true  not 
only  of  diamonds  but  of  any  other ,  say,  hard  crystals  of  carbon.  To 
specify  that  diamonds  and  only  diamonds  are  wanted,  a  further  element  — 
the  numerical  suffix  —  must  be  given.  This  Is  a  four-digit  figure, 
the  first  numeral  of  which  Is  that  of  the  number  of  factors  In  the  term. 
The  other  three  are  those  peculiar  to  the  particular  concept.  In  this 
Instance  the  numerical  suffix  might  be  3001.  Note  that  one  of  the 
factors,  PYPR,  Is  followed  by  the  numerical  Infix  1028.  This  1028  Is 
the  Identifying  numerical  suffix  for  that  particular  physical  property 
(P-PR) ,  "hardness".  Its  use  as  a  numerical  Infix  here  shows  that  the 
particular  physical  property  characterizing  (V)  Jtan»nds  Is  hafdness. 
Only  numerical  suffixes  beginning  with  I  can  be  used  for  Infixes.  Since 
IV  )'s,  jhkI  b'i  Indicate  that  the  code  for  the  concayt  has  more  than 
one  factor,  their  use  as  Infixes  would  make  It  Impossible  to  particular¬ 
ize  specific  concepts  within  the  generic  framework  of  a  code. 
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This  Is  further  explained  by  the  definitions: 

"semantic  factor.  By  this  term  is  meant  the  separate  units  of 
a  code,  expressed  by  the  three  consonants.  In  RAMLrMWHT^  TQMS  *10021*3679, 
the  semantic  factors  are  R-ML,  R-HT,  and  T-HS .  Each  semantic  factor 
represents  one  of  a  number  of  highly  generic  concepts.  Together  they 
form,  as  It  were,  the  building-blocks  of  the  code.  It  should  be  noted 
that  within  a  code  composed  of  more  than  one  semantic  factor,  the 
separate  semantic  factors  are  arranged  alphabetically  Ignoring  the 
Infixes . 


"Infix  By  this  term  Is  meant  certain  symbols  used  with  the 
semantic  fal^  In  a  code.  In  RAMURWHT^TQMSg  1002*3679,  the  Infixes 
are  A.W.Q,  and  1002. 


"Alphabetic  infix.  By  this  term  is  meant  the  infixes  represent¬ 
ed  by  alphabetic  symbols.  They  show  the  analytic  relationships  of  the 
semantic  factors  In  which  they  appear  to  the  concept  represented  by  the 
code. 


"Numerical  Infixes.  By  this  term  Is  meant  the  infixes  repre¬ 
sented  by  numerals  following  the  symbol  $.  They  show,  where  used, 
a  degree  of  particularization  In  the  semantic  factor  to  which  they  are 
affixed.  Actually  every  semantic  factor  may  be  thought  of  as  possess¬ 
ing  a  numerical  Infix;  however,  only  in  certain  instances  are  they 
explicit,  that  is, they  actually  appear  in  the  code.  In  the  majority  of 
Instances,  they  are  Implicit,  that  Is,  they  represent  a  numerical  Infix 
•1001'  which  Is  not  actually  printed  out. 

"Numerical  suffix.  By  this  term  Is  meant  the  particularizing 
number  assigned  each  individual  code  to  distinguish  It  from  all  other 
codes  which,  though  they  represent  different  concepts,  contain  the  same 
semantic  factors." 

The  semantic  code,  then,  attempts  to  eliminate  the  manual  or 
machine  "see  also"  or  "see"  references.  Furthermore,  It  attempts  to 


Include  generic  levels,  as  well  as  particular  characteristics. 

The  code,  of  course,  varies  according  to  the  system  requirements. 
For  example,  In  some  systems  the  following  code  for  diamond  may  be  more 
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valuable:  CERBj^CWRS/GUMM^MANR .  This  may  be  interpreted  as  "a  mineral 

in  crystalline  form  composed  of  carbon  and  used  as  a  gem". 

For  comparison,  the  A.I.Ch.E.  thesaurus  entry  for  "diamond"  is 
shown  below. 


DIAMOND 

PO  Carbon 

RT  Abrasives 
RT  Crystal 


PO  =  Post  (also)  on,  and  RT  =  Related  term 


It  is  obvious  that  it  Is  not  necessary  In  any  one  system  to  Include 
all  possible  meanings,  relationships,  or  generic  levels.  For  example, 
diamond  In  the  sense  of  a  baseball  diamond  would  certainly  be  super¬ 
fluous  In  a  metallurgical  system.  However,  whether  or  not  WRU's  four 
factors  are  sufficient  and  whether  or  not  the  factors  chosen  best  suit 
the  users'  needs  are  both  open  to  debate.  The  code  Is  subject  to 
constant  evaluation  and  revision. 

9.4.6  Coding 

The  terms  of  the  telegraphic  abstract  are  punched  on  IBM 
cards,  one  term  to  a  card.  When  a  day's  production,  about  130  abstracts, 
have  been  keypunched,  the  collected  abstracts  are  Considered  a  "block" 
and  are  ready  for  the  next  step:  matching  with  the  semantic  code 


dictionary.  The  average  block  contains  about  5000  cards*  with  roughly 
50%  of  these  cards  representing  terms,  20%  special  words  such  as 
chemicals  and  proper  nouns,  and  30%  role  Indicators  and  Information  about 

levels  of  logic. 

The  terms  In  each  block  are  then  separated  from  the  role  Indicator, 

4 

punctuation,  and  title  cards,  and  are  sorted  alphabetically.  They  are 
next  compared  on  a  collator  with  the  deck  of  cards  representing  the 
semantic  code  dictionary.  When  there  Is  a  matching  term  In  the 
dictionary  (and  there  Is,  In  over  90%  of  the  cards),  the  proper  semantic 
code  is  automatically  punched  In  the  card  with  the  term.  In  the  10%  of 
the  cases  In  which  there  Is  no  match,  the  card  is  rejected. 

When  the  process  Is  complete,  the  rejected  cards  are  batched 

and  listed. 

There  are  generally  a  certain  number  of  spelling,  keypunching, 
collation,  and  other  errors.  Almost  half  of  the  rejected  terms  are  the 
result,  however,  of  differences  between  the  abstracters'  terminology 
and  that  of  the  dictionary.  About  a  fifth  of  these  terms  are  caused  by 
the  abstracters'  use  of  multiple-word  terms  which  should  actually  have 
been  broken  down  Into  terms  which  appear  Individually  In  the  dictionary. 
The  other  four-fifths  are  caused  largely  by  the  use  of  spellings, 
Inflections,  and  synonyms  which,  though  perfectly  correct,  do  not  appear 
In  the  dictionary. 


177 


1  ,i"  ■ 

i  |,  ;|,j  1 

:  i  ■'  -  ''I1  i  . 

i  ■  ! 

1  * 

This  latter  factor  could  vary  nearly  be  eliminated  by  simply 
assigning  to  each  variant  term  the  code  proper  to  the  originally  appear¬ 
ing  form  of  the  term.  Through  this,  all  future  appearances  of  the 
variant  would  be  automatically  encoded,  and  the  manual  processing  would 
not  be  required.  This  was  the  procedure  originally  envisaged  and  put 
Into  effect,  With  the  Increasing  size  of  the  dictionary,  however,  the 
decision  has  been  made  that  because  of  the  time  required  by  the 
collating  procedure  It  Is  better  not  to  Increase  the  number  of 
dictionary  entries  by  Including  variants  except  where  these  are  very 
frequent.  This  speeds  up  the  automatic  encoding  procedure  at  the 
expense  of  slowing  down  the  manual  encoding  procedure,  but  It  Is  felt 
that  the  over-all  encoding  process  Is  more  efficient.  Faster  matching 
procedures  than  those  provided  by  the  collator  would  naturally  affect 
this  decision. 

When  all  of  this  has  been  done,  there  remain  a  number  of  terms 
which  are  In  fact  new  to  the  dictionary.  These  must  now  be  encoded  and 
Inserted  If  they  are  Judged  acceptable. 

In  the  early  stages  of  the  operation,  new  terms  represented  a  com¬ 
paratively  high  percentage  of  the  listings.  After  the  first  development 
of  the  semantic  code  dictionary,  new  terms  have  come  Into  the  dictionary 
only  as  they  appear  In  the  material  being  encoded.  By  now,  the 
dictionary  has  so  increased  in  size  (21,385  terms,  excluding  chemicals) 


thit  only  a  very  small  percentage  of  the  terms  In  the  meter lei 
encoded  are  not  already  In  the  dictionary.  Of  eight  Matings  which 
were  analyzed  ~  representing  blocks  with  a  total  Inclusion  of  approxi¬ 
mately  16,000  terms  —  only  about  0.6%  of  one  percent  of  the  terms  were 
new  to  the  dictionary.  Of  course  a  change  to  other  fields  (medicine, 
law,  sociology,  etc.)  outside  the  physical  sciences  would  Increase 

tremendously  the  percentage  of  new  terms. 

The  Items  on  each  listing  are  corrected  by  the  encoder  In  accord¬ 
ance  with  the  reason  for  the  appearance  of  the  term  on  the  listing.  In 
most  Instances  nothing  Is  required  but  to  correct  the  term  to  fit  the 
entry  In  the  semantic  code  dictionary.  With  those  few  terms  which 
require  new  codes,  however,  more  elaborate  procedures  are  necessary. 

The  new  terms  are  analyzed  both  as  to  meaning  and  to  the  aspects  of  the 
concept  represented  by  the  term  which  seem  most  likely  to  be  useful  In 
the  searching  procedures,  and  the  new  code  Is  assigned. 

A  complete  discussion  of  the  analysis  required  and  the  procedures 
of  assigning  new  codes  Is  given  In  Appendix  D  of  the  report  referenced 

previously,  The  Semantic  Code  Today. 

* 

9.4.7  Example  of  Input 

Reproduced  In  the  following  four  pages  are  (1)  a  sample 
conventional  abstract,  (2)  the  telegraphic  abstract  of  the  same  Item, 
(3)  the  semantic  codes  for  the  Index  terms  used,  and  (4)  the  encoded 


abstract  on  tape, 
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700-0.  High  Speed  Forming  Of  Metal  ft*tda. 
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9.4 J  hirstiifw  tmaMn 

The  telegraphic  abstract,  th*  encoded  terms,  and  the  search 
program  taken  all  together  comprise  a  machine  Information  retrieval 
system  within  which  the  following  logical  search  devices  are  exploited: 


1.  The  logical  product.  This  means  that  the  machine 
searching  program  can  require  that  to  answer  a 
question,  a  document  must  contain  every  charac¬ 
teristic  specified  In  the  search  program. 


2,  The  logical  sum.  This  means  that  the  machine 
searching  program  can  require  that  to  answer  a 
question,  a  document  may  contain  any  one  of  two 
or  more  characteristics  specified  In  the  search 
program. 


3.  The  logical  difference.  This  means  that  to  answer 
a  question  a  document  must  contain  one  or  more 
characteristics  but  not  a  certain  other  character¬ 
istic  or  characteristics  as  specified  In  the 
search  program. 

Three  pages  are  here  reproduced  which  show  a  sample  test  question, 

1 

Its  logical  analysis,  and  its  structure. 
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(Information  requeated  on:) 

High  valoclty  daformatlon  of  aetala,  Including  cxploalva  loading 


Further  lnqulrlaa  by  tha  Cantar  ravaalad  that  abatracta  containing 
tha  following  Information  wara  of  Intaraat  to  tha  queatloner: 

1.  Impact  axtrualon 

2.  Valoclty  of  forging  dla  (  atatad  aa  a  function  of  width 
Incraaaa  of  forgad  dla 

3.  Impact  loading  for  a train  taata 

4.  Exploalva  hardening 
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In  conclusion,  then,  ft  ctn  be  said  that  the  WRU-ASM  system 
Is  operational,  seems  to  be  meeting  users'  requirements,  and  Is  being 
subjected  to  evaluation  and  Improvement. 

This  system  has  several  aspects  which  are  subject  to  debate,  such 
as  the  term-on-fllp,  the  semantic  code,  roles,  and  links.  WRU  has 
published  and  Is  preparing  several  reports  of  a  statistical,  analytical 
nature  to  show  how,  how  often,  and  how  well  the  system  works  for  Its 
Intended  purpose.  The  debates,  It  seems,  must  either  wait  for  the 
results  of  these  quantitative  studies  or  remain  on -a  purely  theoretical 


basis. 

Arguments  In  documentation  have  long  been  on  a  theoretical  basis 
only, end  MRU  Is  to  be  commended  for  Its  efforts  to  quantitate  and 
analyze  its  operational  data.  As  long  as  this  Is  done  quite  objectively, 
mucn  can  t»e  1  ee med  enc  lTwtfi  een  be  doue  'tuifcrtNi  ■  .g  Of 

disproving  theoretical  arguments. 
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9.5  Documentation  Incorporated 

9.5.1  Introduction 

Documentation  Incorporated  has  had  extensive  experience 
In  Information  systems  design  and  operation.  The  projects  described 
in  succeeding  sections  are  "case  histories"  chosen  to  illustrate 
specific  points. 

9.5.2  Uniterm  Inde^  to  U.S.  Chemical  Patents 

One  of  the  largest  coordinate  indexing  operations 
sustained  for  a  prolonged  period  is  represented  by  this  Index. 

The  Index,  owned  and  marketed  by  Information  for  Industry,  Inc., 
was  initiated  in  1955  and  has  continued  through  1 96 1 .  During  this 
period,  Documentation  Incorporated  prepared  indexes  for  chemical 
patents  issued  from  1950  through  I96I. 

The  product  is  a  double-dictionary  Uniterm  Index,  published 
every  two  months  in  updated  cumulative  form,  and  sold  by  subscription. 
An  assignee  listing,  a  patentee  listing,  and  patent  excerpts  indexed 
by  accession  number  are  included.  (The  accession  numbers  are  used 
for  coordination  and  retrieval.) 

All  patents  appearing  in  the  Chemical  Section  of  the  Official 
Gazette  were  automatically  included,  and  a  selection  was  made  from 
those  announced  in  the  General  and  Mechanical  and  Electrical  sections. 
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Production  procedures  have  ranged  from  hand-posting  to  a  machine 
operation  on  the  IBM  305  (RAMAC) .  In  1 96 1  the  camera-ready  copy  was 
produced  by  means  of  an  IBM  1401.  The  so-called  minor  terms,  usually 
names  of  compounds,  were  listed  separately  In  the  Index.  (This  listing 
was  prepared  more  conventionally.)  This  separation  permitted  ready 
access  to  new  Information,  particularly  to  new  compounds. 

The  Initial  vocabulary  resulted  from  "free  Indexing"  of  the 
patents;  It  was  not  pre-established  except  as  the  past  training  and 
experience  of  the  Indexers  affected  Its  structure.  The  vocabulary  was 
derived  directly  from  the  patents;  new  terms  were  added  where  necessary, 
but  each  addition  was  carefully  checked.  A  system  of  see  references 
was  mandatory  because  of  the  many  synonyms  for  the  names  of  chemical 
compounds  and  the  differences  In  nomenclature. 

To  Illustrate  the  growth  of  the  vocabulary  In  terms  of  the 
number  of  patents:  In  '955,  the  6,065  patents  were  indexed  by  3,700 
major  terms  (13,000  minors);  in  1961,  the  10,982  patents  were  reflected 
by  a  major-term  vocabulary  of  more  than  7,000  terms  (50,000  minors, 
estimated) . 

Because  posting  on  certain  terms  was  so  numerous,  and  since  this 
was  a  manual  retrieval  system,  two  methods  were  developed  for  reducing 
the  bulk  of  postings.  In  the  first  method,  see  also  references  were 
established  to  indicate  that  when  a  term  heading  was  followed  by 
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,  a  complete  generic  search  could  only  be  obtained  by 


additionally  consulting  all  the  terms  enumerated  under 

The  other  method  for  controlling  posting  dens  tty  retained  the 
existing  relationship  between  certain  terms  such  as  "oil"  and 
"treatment",  or  "polymerization"  and  "catalyst".  These  terms  were 
pre* coordinated  by  the  Indexers  and  published  as  bound  terms. 

The  usefulness  of  the  service  Is  att,  ted  by  the  rise  In  the 
number  of  subscriptions  to  more  than  100  and  by  the  fact  that 
several  companies  were  actually  able  to  discontinue  their  patent 
Indexing  activities. 

9.5.3  U.  S.  Army  Chemical  Research  and  Development 
Laboratories. 

A  pilot  project  was  designed  to  provide  a 
feasibility  study  and  test  of  the  effectiveness  and  suitability  of  an 
indexing  system  based  upon  key  words  and  filed  on  punched  cards  for 
storage  and  retrieval  of  references  by  coordination,  utilizing 
relatively  low-cost  equipment,  such  as  the  IBM  9900  Special  Index 
Analyzer.  Adaptability  of  this  system  to  high-speed  computer 
operations  was  also  considered. 

A  sample  based  on  2,000  selected  Internally-generated  reports 
was  used  for  the  pilot  project.  An  Ad  Hoc  Committee  representing  the 
various  research  elements  monitored  the  program  on  the  contract  and 
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conducted  a  retrieval  test  based  upon  75  especially-prepared  questions  to 
test  accuracy  and  depth  of  Indexing  and  retrieval  efficiency.*  Evalua¬ 
tion  of  the  results  led  to  the  recommendation  that  the  Laboratories 
should  convert  the  Technical  Library  to  this  system  of  coordinate 
i ndexi ng . 

This  pilot  project  included  an  experiment  In  the  application  of 
a  method  of  "two-level  indexing".  This  method  can  be  described  as 
follows:  In  the  major  index,  compounds  are  treated  as  units,  and  the 
full  name  of  the  compound  is  considered  a  term  in  the  system.  Postings 
for  documents  appear  under  such  terms,  just  as  they  do  under  the  other 
terms  of  the  system.  Supplementing  the  major  index  Is  another  index  to 
compounds  where  the  terms  are  the  parts  of  the  compounds,  and  the 
compounds  become  items  in  the  system.  The  compounds  are  numbered  and 
posted  under  appropriate  terms.  A  generic  search  would  then  proceed 
In  two  steps,  one  a  search  in  the  compound  index  for  all  compounds 
Indexed  under  certain  generic  terms,  e.g.,  all  compounds  which  are 
Dichloro  derivatives  and  heterocyclic.  Coordination  at  this  level  will 
deliver  a  set  of  numbers  which  represent  compounds  used  as  terms  in  the 
major  coordinate  index.  In  the  second  step,  the  basic  question  with 

*  (It  Is  Interesting  to  rote  that  the  vocabulary  developed  for  this 
contract  was  unusually  large,  about  12,000  terms.) 
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which  a  searcher  is  concerned  might  be:  "AH  such  confounds  used  for 
Ihsectlcldes".  The  numbers  delivered  by  the  first  search  would  be 
summed  and  Intersected  with  the  other  term,  "Insecticide".  This  would 
then  yield  the  relevant  documents  In  the  system.  In  a  mechanized 
system,  this  procedure  is  much  less  complicated  than  Is  Its  description. 

9.5.4  AtMc  Enemy  CommJ.S.sJffl 

A  pilot  experiment  was  directed  toward  the 
mechanized  preparation  of  Indexes  to  Nuclear  Science  Abstracts.  This 
work  was  an  extension  of  a  technique  already  In  use  at  Oak  Ridge, 
wherein  a  "short  title"  was  cited  under  personal  author  and  corporate 
author  entries. 

The  methods  comprised  establishing  and  keypunching  a  uniform 
title  line  for  each  report,  storing  these  titles  in  the  IBM  305  (RAMAC) , 
and  querying  the  RAMAC  by  the  terms  appropriate  to  the  type  of  Index 
des I  red . 

Such  a  program  appeared  generally  feasible.  No  significant  loss 
of  Information  resulted  from  the  substitution  of  the  uniform  title  line 
for  the  more  conventional  nodifler  system  used  to  follow  the  subject 
headings.  Machine  compilation  of  the  Index  eliminated  typing  and 
reduced  proofreading  time.  The  appearance  of  the. computer  printout 
(all  upper  case  letters)  was  markedly  improved  by  the  addition  of  a 
"boldface-type"  feature. 
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A  It- hough  Nuclear  Science  Abstracts  does  not  employ  coordinate 
Indexing,  the  work  described  here  Is  Included  because  of  the  applicabil¬ 
ity  of  the  uniform  title  line  to  coordinate  Indexing  systems  as 
evidenced  by  Its  use  in  the  NASA  Scientific  and  Technical  Information 
Facility  operated  by  Documentation  Incorporated. 

9.5.5  Air  Force  Office  of  Scientific  Research 

Investigations  have  been  directed  toward  the 
development  of  a  storage  and  search  theory.  This  theory  led  to  the  design 
of  a  mechanized  system  consisting  of  a  corpus  of  computer-stored  data 
and  Information,  with  an  Index  to  the  store  maintained  externally.  The 
validity  of  the  concept  was  put  to  test  as  the  Experimental  Contract 
Highlight  Operation  (ECHO),  which  was  oriented  toward  the  documentation 
requirements  of  scientists  administering  Air  Force  research  projects 
In  practically  all  scientific  fields.  Documents  generated  In  the 
normal  course  of  administration  of  these  projects  were  used  during  the 
research  and  for  the  experimental  operation.  Over  3,000  research 
efforts  from  750  organizations  were  ultimately  represented  In  the  store 
and  in  the  Index;  the  latter  attained  a  size  of  some  3,600  subject 
terms  and  700  terms  in  administrative  categories. 

A  new  contract  Is  directed  toward  work  In  multipurpose  Information 
system  design  with  simulation  to  culminate  In  the  development  of  IBM 
] 40)  programs.  (The  original  philosophy  will  be  retained  In  that  the 
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Index  will  continue  to  be  maintained  external  to  the  computer  store,) 

A  planned  step  In  the  program  Is  an  opportunity  to  conduct, 
simultaneously  with  normal  operations,  experimentation  In  which  the 
documents  will  be  Indexed  (by  their  originators)  from  a  list  of  12 
cateoorles  and  a  vocabulary  restricted  to  approximately  1,000  key  words. 
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Four  major  problems  have  bean  encountered  In  the  operation  of 
coordinate  Indexing  systems. 

(1)  Cost 

(2)  Evaluation 

(3)  Vocabulary  and  Structure 

(4)  User  Education. 

Not  listed  as  a  major  problem  Is  the  mechanization  of  posting  and 
search,  since  this  only  poses  problems  peculiar  to  a  particular 
environment.  One  possible  exception  Is  In  chemical  nomencMMi.  The 
length  of  terms  and  the  complexity  of  chemical  compounds  create  both  a 
manipulation  and  a  search  problem.  In  many  systems  the  compound  Is 
simply  treated  as  a  term  In  the  over-all  vocabulary  Iwlth  perhaps  some 
generic  posting),  assigned  acode,>or  otherwise  processed  normally. 

One  of  the  systems  In  which  the  chemical  compound  problem  was 
solved  relatively  easily  was  at  Proctor  6-  Gamble  where  an  IBM  101 
electronic  statistical  machine  Is  used  for  searching.  The  solution  Is 
described  below. 

"...  At  first  we  gave  each  chemical  mentioned  In  a 
report  a  code  number  ...  We  found  this  Impractical, 
for  example,  when  we  encountered  a  report;  In  which 
a  chemist  Itemized  results  on  as  many  as  50  or  more 
compounds  he  had  tested  for  germicidal  propertlties. 

Many  times  Individual  chemicals  are  examined  once  for 
a  special  purpose  and,  proving  unsatisfactory,  they 
aru,  nevtit  Wafcftd  4rbe  fflWsWft  m 


The  printout  problem  of  superscripts,  subscripts ,  etc.#  csn  be 


handled  quite  simply  by  devising  new  conventions. 

In  some  systems  a  high  degree  of  specificity  In  chemicals  Is  requir¬ 
ed.  The  Patent  Office,  for  example,  has  outlined  the  following  require¬ 
ments  for  one  of  Its  operations. 

"To  Identify  a  chemical  compound  for  patent  searching 
purposes,  It  Is  believed  that  It  Is  necessary  for  a 
system  to  be  able  to  do  several  things,  especially  In 
the  phosphorus  art. 

1.  To  be  able  to  Identify  each  of  the  fragments 
comprising  a  compound. 

2.  To  be  able  to  Identify  the  number  of  times 
each  different  fragment  occurs  In  the  compound. 

3.  To  be  able  to  find  the  relationships  between 
these  various  fragments  In  the  compound. 

4.  To  be  able  to  ask  the  search  question  either 
very  specifically  or  generlcally,  that  Is, 
with  any  degree  of  generlclty  desired," 

The  Patent  Office,  R&D‘,  the  above  report  states,  will  have  available 
detailed  Instructions  on  fragmenting  and  nodallzatlon. 

A  random  access  method  of  searching  Is  being  used  at  the  Patent 
Office,  using  the  RAMAC,  as  well  as  a  punched  card  system. 
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Other  groups  have,  having  fragmented  and/or  nodal i zed  compounds, 
used  other  search  means,  such  as  peek-a-boo  cards. 

To  a  certain  extent,  cost  and  evaluation  go  hand  In  hand.  Where 
cost  Is  high  It  may  be  justified  in  terms  of  value  received.  However, 
as  pointed  out  in  Section  7  of  this  report,  there  has  been  no  satisfac¬ 
tory  method  for  determining  the  effectiveness,  let  alone  the  value,  of  a 
system.  Certain  measures  are  available;  for  example,  a  system  used 
solely  for  general  academic  research  might  not  justify  the  expense  of 
detailed  Input  —  deep  indexing  or  introduction  of  structure. 

Where  a  system  Is  used,  on  the  other  hand,  to  determine  whether  or 
not  a  specific  kind  of  work  has  already  been  done,  the  savings  in  total 
development  cost  to  either  a  contractor  or  the  Government  may  be  so 
large  as  to  justify  a  costly  system. 

Cost  is  also  a  factor  in  determining  whether  or  not  structure  is 
imposed  on  a  system  at  the  input  or  at  the  output.  For  example,  where 
a  large  amount  of  extraneous  material  is  retrieved  during  search,  and 
cost  is  incurred  in  screening,  evaluating,  disseminating  and  discarding 
quantities  of  material,  it  may  indeed  be  worth  using  roles,  links  or 
other  such  devices  at  the  input. 

Here  again,  no  satisfactory  method  has  yet  been  found  of  evaluating 
a  system  to  a  point  where  the  mode  and  value  of  structuring  can  be 


determined . 
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The  means  and  location  of  structuring  In  a  system  will,  to  a 
certain  extent,  determine  the  form  of  the  vocabulary  and  Indexing.  Where 
a  computer  program,  for  example,  can  be  used  to  post  and  search  on 
higher  and  lower  generic  levels,  there  Is  no  necessity  for  complicated 
Indexing  procedures  and  Indexing  aids.  Of  course,  thesauri  or  other 
posting  Instructions  have  to  be  developed  for  the  machine. 

At  this  time,  the  vocabulary  problem  is  probably  the  most  critical. 
Much  of  the  work  underway  In  this  area  is  detailed  In  Sections  4  and  5 
of  this  report. 

There  is  doubtlessly  some  limit  below  which  a  vocabulary  can  be 
totally  free  and  without  complications  such  as  roles,  links,  and 
thesauri.  Where  the  limit  lies  and  how  areas  outside  that  limit  can  best 
be  handled  has  not  been  determined. 

Especially  with  a  large  vocabulary  and  with  broad  system  capabili¬ 
ties,  user  education  is  of  prime  importance.  It  has  often  been  said 
that  the  statement  of  a  problem  is  at  least  half  the  solution.  That 
scientists  and  technicians  cannot  formulate  their  questions  is  therefore 
not  surprising.  If,  however,  they  have  at  least  some  Idea  of  what  they 
want,  the  system  should  be  such  that  (l)  the  vocabulary  will  assist  them 
or  (2)  that  an  intermediary  searcher  can  assist  them.  The  work  of 
Stiles,  Stevens,  and  others  (described  In  Section  7.4)  is  toward  the 
very  large  goal  of  having  a  system  find  material  automatically  even 
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Leas  ambitious,  but  equally  Important,  are  the  efforts  to  translate 
request!  by  Intermediate  searchers.  The  work  of  such  a  person,  an 
V  Informat  Ion  researcher"  ,  at  Esso  Research's  Technical  Information 
Division  Is  described  In  the  example  below. 

"...  One  request  rend:  'Please  arrange  for  a  patent 
search  of  methods  of  solidifying  petroleum 
fractions  by  reaction  with  stearates  or  stearic 
acid.'  If  the  search  had  been  made  without  further 
discussion  of  the  request,  It  could  have  been  a 
monumental  task,  covering  the  entire  recorded 
Information  on  greases  and  other  thickened  fluids. 

Actually  this  request  was  found  to  Involve  only 
enough  general  background  to  enable  the  questioner 
to  have  a  very  superficial  acquaintance  with 
thickened  fluids.  The  request  was  then  handled  with¬ 
in  five  minutes.  However,  It  required  about  30 
minutes  and  several  phone  calls  to  reach  mutual 
understanding."* 

The  Importance,  Illustrated  by  this  example,  of  a  feedback  between 
user  and  system  cannot  be  overstressed.  Not  only  can  the  system  be  made 
to  function  more  efficiently  If  needs  are  known,  but  also  the  user  Is 
not  left  unsatisfied  and  disappointed.  Too  often,  users  criticize 
Information  retrieval  systems  for  giving  them  too  much  or  too  little 
Information,  without  realizing  that  they  are  getting  Just  what  they 
MfiLfoi*. 
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1 1  .  CONCLUSION 

Having  completed  the  study  and  preparation  of  a  report  to  this 
point,  Documentation  Incorporated  recognizes  an  obligation  to  set  forth 
its  own  conclusions  concerning  the  present  state  of  coordinate  indexing 
and  recommendations  concerning  future  developments. 

As  indicated  in  the  report  itself,  most  of  the  literature  and 
controversy  which  have  developed  In  this  field  have  been  concerned 
with  degrees  of  freedom  or,  conversely,  with  degrees  of  structure. 
Although  this  debate  has  usually  been  concerned  with  connections 
among  terms,  it  has  also  Involved  the  Initial  selection  of  terms.  With 
reference  to  the  selection  of  terms,  proposals  have  Included  automatic 
selection  of  words  in  the  title,  abstract,  or  text;  the  selection  of 
words  from  a  text  by  clerical  workers;  the  assignment  of  words  by 
competent  Indexers  or  subject  specialists;  and  the  selection  of  terms 
from  rigid  authority  lists  and  similar  devices.  The  words  or  terms 
have  been  variously  called  "Uniterms",  "keywords",  descriptors  , 
"selectors",  "designators",  etc.,  and  structural  elements  surrounding 
or  added  to  indexing  terms  for  the  purpose  of  eliminating  the  "noise" 
have  been  called  "permuted  indexes",  "WIC",  and  "standard  title  line 
indexes",  etc. 

Although  the  individuals  or  companies  that  have  suggested  one 
designation  or  another  have  usually  tried  to  distinguish  the  connotation 
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of  their  name  for  an  indexing  term  from  all  other  connotations,  the 
public  in  general  has  had  only  a  hazy  idea  of  the  differences,  if  any, 
among  Unlterms,  keywords,  descriptors,  selectors,  designators,  etc. 
There  has  been  a  similar  haziness  in  the  public's  mind  regarding  the 
specific  nature  or  the  specific  differences  among  thesauri,  subject 
heading  lists,  authority  lists,  code  dictionaries,  etc.;  among  links, 
roles,  role  indicators,  interfixes,  semantic  factors,  etc.;  and  among 
coordinate  indexes,  concept  coordination,  multilevel  indexing,  and 
correlative  indexing. 

One  may  conclude  from  this  situation  that  much  of  the 
controversy  reduces  to  a  proprietary  interest  in  a  particular  name  for 
a  quite  ordinary  activity,  suggested  by  one  individual  or  another  or 
one  firm  or  another.  It  Is  not  intended  to  excuse  Documentation 
Incorporated  from  this  general  estimation.  But  when  one  comes  to 
recognize  what  has  o-.curred,  it  is  a  mark  of  wisdom,  if  not  of  valor, 
to  say  with  relief,  "A  plague  on  all  our  houses"  and  to  take  a  fresh 
look  not  at  favorite  names  but  at  the  operating  problems  which  require 
solution.  The  devices  and  apparati  named  can  then  be  seen  as  a  number 
of  available  tools  of  the  systems  designer  for  solving  special 
operating  problems  in  special  environments.  Some  environments  will 
call  for  free  indexing;  others  will  call  for  hierarchical  posting, 
others  will  require  pre-coord i nat ions , ro 1 e  indicators,  or  links. 


206 


Some  will  permit  free  redundant  Indexing;  others  will  require  carefully 
controlled  vocabularies  with  elaborate  structures  of  cross-references. 

The  true  expert  In  Information  retrieval  systems  will  know  how  to 
select  the  proper  apparatus  and  the  proper  degree  of  freedom  to  design 
the  best  system  for  any  particular  operating  environment. 

It  Is  the  considered  opinion  of  the  Documentation  Incorporated 
study  group  that  linguistic,  semantic,  and  syntactic  studies  do  not 
satisfy  the  Immediate  needs  of  coordinate  Indexing.  This  opinion  Is 
supported  In  the  literature*  and  was  found  among  various  Individuals 
Interviewed  during  the  survey.  This  Is  not  a  universal  belief,  however. 

A  glance  at  the  list  of  research  efforts  In  scientific  documentation** 
discloses  a  considerable  amount  of  such  work.  Without  attempting  to 
resolve  the  question,  this  study  group  would  suggest  that  a  better 
balance  between  linguistic  studies  and  systems  work  be  attempted. 

To  sunmarlze,  Documentation  Incorporated  feels  that  the  following 
recommendations  are  In  order: 

(l)  That  the  I.R.  problem  be  viewed  In  part  as  a 
problem  of  the  optimum  design  of  an  engineering  system 
and  not  solely  as  a  problem  of  basic  research  Into  linguistics 

*  See  Bar-Hlllel,  Yehoshua,  "A  Logician's  Reaction  to  Recent  Theorizing 
on  Informal-  Inn  Search  Systems."  American  Documentation.  Vol  .  VIII,  No. 2, 
April  1957,  pp.  103-113;  and  "Some  Theoretical  Aspects  of  the  Mechanization 
of  Literature  Searching,"  Technical  Report  No.  3,  prepared  under 
ONR  Contract  No.  N62558-2214  and  under  a  grant  from  NSF ,  April  i960. 

**  See  Current  Research  and  Development  In  Scientific  Documentation. 

No.  9.  National  Science  Foundation,  Office  of  Science  Information 

Service,  NSF-61-76,  November  1961 . 


and  meaning.  Such  systems  work  would  tend  to  bridge  the 
gap  between  linguistic  research  and  the  actual  operation 
of  systems. 

(l)  that  the  annual  preparation  of  a  critical  review 
encompassing  all  applicable  areas  of  Information  retrieval 
be  undertaken  or  sponsored  by  some  appropriate  agency. 


I 
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Bailar,  John  C.  Jr. ,  Heumann,  Karl  F. ,  and  Seiferle,  Edwin  J. ,  "The 
Use  of  Punched  Card  Techniques  in  the  Coding  of  Inorganic  Compounds/ 
J,  Chemical  Education,  25,  1948,  pp.  142  -  143,  176. 

As  part  of  its  attempt  to  set  up  a  mechanized  system  for  correlating 
molecular  structure  with  biological  activity,  the  Chemical-Biologi¬ 
cal  Coordination  Center  of  the  National  Rectearch  Council  felt  it 
necessary  to  develop  codes  for  expressing  chemical  compounds  in 
punched  cards.  This  paper  describes  the  development  of  a  code  for 
inorganic  compounds.  The  problems  of  coding  are  not  directly  re¬ 
lated  to  the  problems  of  coordinate  indexing,  but  this  paper  is  in¬ 
cluded  because  it  does  recognize  that  the  cards  may  be  used  to 
correlate  different  terms,  l.e.,  descriptions  of  structure  and 
descriptions  of  biological  activity. 


Bailey,  M.  F.,  Ianham,  B.  E.,  and  Leibowitz,  J.,  "Mechanized 
Searching  in  the  U.  S.  Patent  Office,"  (presented  at  ja  meeting  of 
the  Division  of  Chemical  Literature,  American  Chemical  Society, 
February  1951),  J.  Patent  Office  Society,  35,  August  1953,  pp.  566 
507. 

This  paper  reports  further  experiments  in  the  Patent  Office  of  the 
use  of  punched  card  equipment  to  provide  "multiple  categorization 
of  compositions"  and  search  by  one  or  more  categories.  It  also 
illustrates  the  unwillingness  of  the  Patent  Office  group  to  depend 
entirely  upon  a  system  of  coordinate  indexing  and  their  use  of 
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generic  coding  within  each  category.  Thus,  the  system  they  describe 
is  a  classification  system  modeled  closely  on  the  present  classifi¬ 
cation  system  used  in  the  Patent  Office,  The  machine  supplements 
what  is  possible  in  the  manual  search  only  to  the  extent  of  making 
possible  a  search  by  more  than  one  generic  code  at  a  time. 


Bailey.  M.  F.  and  Cochran,  S.  W. ,  "Patent  Searching  —  General 
Files/'  Punched  CardB.  Their  Applications  to  Science  and  Industry, 
edited  by  Robert  S.  Casey  and  JameB  W.  Perry,  New  York,  Reinhold 
Publishing  Corporation,  1951,  pp.  367  -  377. 

This  discussion  of  patent  searching  points  out  the  difficulties  which 
arise  from  single  place  classification  systems  even  when  such  systems 
are  provided  with  cross-references.  The  authors  show  a  solution  to 
this  problem  in  the  use  of  coding  and  machines  which  would  make 
possible  searches  by  intersections  of  classes.  They  conclude  that 
standard  sorting  machines  are  not  adequate  to  the  requirements  of 
chemical  coding  and  multiple-term  searching,  as  required  by  Patent 
Office  searches. 


Bar-Hillel,  Yehoshua,  "A  Logician's  Reaction  to  Recent  Theorizing 
on  Information  Search  Systems,"  American  Documentation,  Vol.  VIII, 

No.  2,  April  1957,  pp.  103  -  113. 

An  examination  of  some  of  the  writings  of  the  Western  Reserve  group, 
Documentation  Incorporated,  and  Calvin  Mooers  on  the  development  of 
new  I.  R.  systems.  Bar-Hillel  states  that  the  value  of  new  contri¬ 
butions  has  been  exaggerated  and  that  more  work  should  be  done  to 
improve  traditional  systems.  The  only  real  contribution  he  finds 
in  the  new  theories  is  the  recognition  of  the  simple  fact  that  a  set 
of  documents  and  their  terms  "form  then  a  Boolean  Algebra  with  re-  ^ 
spect  to  the  operations  of  complementing,  intersecting,  and  Joining. 


Barden,  William  A.,  Hammond,  William  and  Heald,  J.  Heston,  "Automation 
of  ASTIA,  A  Preliminary  Report-,"  AD-227  000.  Arlington,  Virginia, 
Armed  Services  Technical  Information  Agency,  December  1959,  50  pp. 

"Early  considerations  in  automation"  by  William  A.  Barden:  The 
history  of  ASTIA' s  experience  in  planning  and  implementing  the 
automation  of  its  functions  is  presented.  Different  ideas  were 


212 


examined  and  discarded  in  a  search  for  a  more  efficient  method  of 
indexing  and  retrieving  information*  In  1953  a  preliminary  study 
based  on  systems  concepts  embracing  all  the  functions  and  services 
of  the  Agency  was  conducted,  but  a  full  scale  study  was  not  possible 
until  1958.  When  the  final  selection  of  the  Remington  Rand  USS-90 
(Univac  Solid  State  Computer)  was  made,  the  ASTIA  staff  devised 
methods  for  making  optimum  use  of  the  equipment  in  both  the  business- 
type  and  information  retrieval  functions. 

w  Automation  program  by  William  Hammond;  The  pre- automat  ion  and  auto¬ 
mated  processing  of  reports  through  ASTIA  and  validation  of  requests 
of  military  contractors  is  described.  The  three  stages  by  which  the 
automatic  data  processing  system  will  be  put  into  operation  are  ex¬ 
amined,  and  the  process  of  compiling  mechanized  cumulative  indexes 
to  the  Technical  Abstract  Bulletin  is  presented. 

« Creation  of  a  Thesaurus  of  Scientific  Descriptors  by  J.  Heston  Heald; 
The  main  objectives  of  Project  MARS  (Machine  Retrieval  System)  are: 
(l)  to  prepare  a  Thesaurus  of  descriptors;  and  (2)  to  assign  these 
descriptors  to  all  AD  numbered  reports  in  the  ASTIA  collection. 

The  ASTIA  subject  headings  and  subdivisions  were  overhauled  and  the 
list  reduced  from  70,000  to  about  9,000  headings,  now  termed  de¬ 
scriptors.  The  scope  of  subject  coverage  was  divided  into  about 
290  generic  categories  called  display  schedules.  Procedures  were 
established  for  the  assignment  of  retrieval  terms,  both  standard 
descriptors  from  the  Thesaurus  and  "open-ended  terms"  which  will 
not  appear  in  the  Thesaurus  but  will  provide  additional  retrieval 
access  points  in  the  form  of  project  names,  equipment  nomenclature, 
trade  names . " 

fAmerican  Documentation  Abstract] 


Batten,  W.  E.,  "Specialized  Files  for  Patent  Searching,"  Punched 
Cards.  Their  Applications  to  Science  and  Industry,  edited  by 
Robert  S.  Casey  and  James  W.  Perry,  New  York,  Reinhold  Publishing 
Corporation,  1951,  pp.  169  -  181. 

It  is  from  this  paper  that  the  designation  "Batten  systems"  has 
been  derived  to  designate  inverted  systems  using  optical  coinci¬ 
dence  as  a  method  of  search.  Batten  was  primarily  concerned  with 
presenting  the  advantages  of  an  inverted  "aspect  system  as  con¬ 
trasted  with  conventional  Hollerith  systems.  He  did  not  recognize 
explicitly  that  the  coincidence  of  holes  on  any  two  aspect  cards 
indicated  those  items  which  were  members  of  a  product  class.  In 
fact,  he  proposed  that  his  aspect  cards  be  arranged  not  as  a  set 
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of  terms  but  as  a  classification  system.  However,  he  did  realize 
that  his  classification  system  was  mobile  and  that  he  could  compare 
members  in  one  class  with  menibers  in  another. 


Bernier,  Charles  L. ,  "Correlative  Indexes,  I  -  V,"  American 
Documentation,  Vol.  VII,  No.  4  (October  1956),  Vol.  VIII,  No.  1 
(January  1957],  Vol.  VIII,  No.  3  (July  1957),  Vol.  VIII,  No.  4 
(October  1957),  Vol.  IX,  No.  1  (January  195B). 

Correlative  indexing  provides,  in  a  nonmanipulative  form,  e.g. , 
books  or  card  files,  the  type  of  Boolean  search  permitted  by  a 
coordinate  index.  It  does  so  by  printing  out  under  each  term  not 
only  the  number  of  the  item  but  all  the  other  terms  used  in  con¬ 
nection  with  a  given  term  to  index  that  item.  Essentially  correla¬ 
tive  indexing  provides  in  book  form  the  same  type  of  revolution  of 
positions  of  terras  developed  by  the  Chemical-Biological  Coordination 
Center.  Dr.  Bernier  also  discusses  problems  of  vocabulary  control 
and  the  type  of  terms  to  be  used  in  correlative  indexes. 


Bibliography  in  an  Age  of  Science  (Louis  N.  Ridenour:  "Bibliography 
in  an  Age  of  Science,  Ralph  R.  Shaw:  "Machines  and  the  Biblio¬ 
graphical  Problems  of  the  Twentieth  Century,"  Albert  G.  Hill: 
"Storage,  Processing  and  Communication  of  Information"),  Urbana, 
University  of  Illinois  Press,  1952. 

The  reference  to  this  volume  is  included  in  this  bibliography 
because  it  represents  one  of  the  earliest  recognitions  by  the 
library  profession  itself  of  the  impact  of  machines  on  traditional 
library  activities.  The  article  by  Dr.  Ridenour  emphasizes  the 
compression  of  storage  and  the  communication  of  bibliographical 
information  from  central  depositories.  The  article  by  Dr.  Shaw 
describes  various  punched  card  devices  but  is  primarily  concerned 
with  the  Rapid  Selector,  a  microfilming  scanning  device  developed 
by  Dr.  Shaw  based  upon  a  suggestion  of  Dr.  Vannevar  Bush.  Although 
Shaw  envisioned  the  use  of  the  Rapid  Selector  to' store  traditional 
indexes,  e.g.,  the  Index  to  Chemical  Abstracts,  the  potentiality  of 
the  Rapid  Selector  as  a  coordinate  searching  device  was  recognized 
by  many  others. 
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Bohnert,  Lea  M. ,  "Two  Methods  of  Organizing  Technical  Information 
for  Search/'  American  Documentation,  Vol.  VI,  Ho.  3,  July,  1955, 
pp.  134  -  151.  *  ~~ 

"Two  methods  of  organizing  technical  information  for  search 
have  been  distinguished  on  theoretical  grounds.  One  .was 
the  traditional  method  of  library  classification.  The 
other  method  was  non-hierarchical  and  relied  on  combinations 
of  general  terms  to  characterize  specific  ideas  or  terms.... 

"So  far,  the  second  method  alone  has  been  tried  either  in 
relatively  small  collections  (10,000  to  50,000  documents) 
or  in  new,  i.e.,  marginal,  fields  of  knowledge,  such  as 
instrumentation.  Its  practicality  for  larger  collections 
(hundreds  of  thousands  of  documents),  and  ones  in  which 
most  of  the  fields  of  science  and  technology  would  be  in¬ 
volved,  is  still  to  be  proven." 

[Author’s  Conclusions] 


Bracken,  R.  H. ,  and  Tillitt,  H.  E.  "information  Searching  with  the 
701  Calculator,"  Association  for  Computing  Machinery  Journal,  Vol.  4, 
No.  2,  April  1957,  pp.  131  -  136. 

"The  application  of  a  701  calculator  is  described,  using 
magnetic  tape  data  storage,  for  the  control  of  about 
14,000  items  with  a  coordinate  index.  Over  9,600  de¬ 
scriptors  are  used  for  subject  access,  and  the  system 
is  used  for  approximately  16  searches  three  times  a 
week.  The  total  time  for  the  set  of  16  searches  is 
11  minutes.  Planned  modifications  include  the  use  of 
a  new  type  of  tape  and  the  substitution  of  a  core 
memory  for  the  electrostatic  memory." 

[AD  Abstract] 


Brockvay,  Duncan,  "Coordinate  Indexing  at  the  University  of  New 
Hampshire  Library,"  American  Documentation,  Vol.  X,  No.  3,  July 
1959,  pp.  228  -  231. 

This  account  of  an  experiment  in  coordinate  Indexing  is  of  interest 
because  the  author  gives  figures  for  density  and  distribution  of 
posting  and  also  figures  for  "fal.se  drops"  as  a  function  of  the 
number  of  terms  in  a  search.  The  project  covered  a  narrow  special 
field  and  its  results  are  probably  applicable  to  similar  special 
collections. 
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Bush,  V.,  (Chairman  of  the  Committee),  Report  to  the  Secretary  of 
Commerce  by  the  Advisory  Committee  on  Application  of  Machines  to 
Patent  Office  Operations,  Washington,  Department  of  Commerce, 

22  December  1954. 

When  this  report  appeared,  it  was  considered  a  milestone  in  the 
progress  towards  mechanization  of  information  storage  and  retrieval. 
Five  recommendations  were  made  by  the  Committee  to  the  Secretary  of 
Commerce,  namely: 

1.  The  Patent  Office  should  put  machine  searching  of  compositions 
of  matter  on  an  operational  basis. 

2.  The  reclassification  of  patents  should  be  accelerated. 

3.  A  research  and  development  unit  should  be  established  in  the 
Patent  Office. 

4.  The  National  Bureau  of  Standards  and  the  Patent  Office  should 
undertake  a  Joint  program  to  stimulate  and  develop  machines 
and  techniques  specifically  adapted  to  the  Patent  Office 
operations. 

5.  An  advisory  committee  Bhould  be  attached  to  the  Office  of  the 
Secretary  of  Commerce  to  stimulate  and  coordinate  the  program 
and  related  efforts  within  the  Commerce  Department. 

An  estimate  of  accomplishment  in  these  five  areas  was  made  by  a 
Committee  of  the  National  Academy  of  Sciences  -  National  Research 
Council  in  1960.  The  1960  report  indicates  that  Recommendation  5 
was  not  carried  out  and  that  Recommendations  1,  3,  and  4  require 
re-thinking  and  modification  in  the  light  of  experience  since  1954. 
However,  the  1960  report  did  not  mention  any  activities  under 
Recommendation  2  and  it  ,is  this  recommendation  which  is  crucial  for 
the  problem  of  coordinate  Indexing  and  mechanized  search.  The 
Patent  Office  group  has  always  felt  that  hierarchical  classification, 
rather  than  coordinate  Indexing,  is  the  required  intellectual 
structure  for  mechanized  information  storage  and  retrieval.  It 
recognized  that  its  existing  classification  system  was  not  adequate 
for  mechanization  but  it  supposed  that  the  existing  classification 
system  could  be  modified  without  changing  its  essential  structure. 

It  is  possible  that  whatever  activity  there  has  been  under 
Recommendation  2  has  operated  to  the  detriment  of  accomplishment 
under  Recommendations  1,  3,  and  4. 
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Caaey,  R.  S. ,  Bailey,  C.  F. ,  and  Cox,  G.  J. ,  "Punched  Card  Techniques 
and  Applications,"  J.  Chemical  Education,  25,  1946,  pp.  495  -  499. 

Although  this  paper  is  primarily  concerned  with  describing  punched 
cards  and  the  techniques  of  coding  and  handling  them,  it  does  recog¬ 
nise  what  it  calls  the  possibility  of  using  such  devices  to  correlate 
information.  In  considering  the  use  of  punched  cards  for  chemical 
information,  the  paper  indicates  the  possibility  of  combining  a 
search  for  a  class  of  materials  and  a  class  of  properties  or  uses. 


Center  for  Documentation  and  Communication  Research,  School  of 
Library  Science,  Western  Reserve  University,  "Comments  on  'A 
Logician’s  Reactions,'"  American  Documentation.  Vol.  VIII,  No.  2, 
April  1957,  pp.  117  -  122. 

Comments  on  "A  Logician's  Reaction  to  Recent  Theorizing  on  Infor¬ 
mation  Search  Systems." 

[Cf.  Bar-Hillel,  "A  Logician's  Reaction  to  Recent  Theorizing  on 
Information  Search  Systems;"  Wooers,  "Comment  on  Bar-Hillel' s 
'A  Logician's  Reaction  to  Recent  Theorizing  on  Information 
Search  Systems'"] 


"The  Chemical- Biological  Coordination  Center  of  the  National  Research 
Council,"  Washington,  National  Research  Council,  September  1954. 

Although  primarily  devoted  to  a  discussion  of  coding  and  a  de¬ 
scription  of  the  operations  of  the  Center,  this  paper  does  contain 
a  brief  statement  of  the  way  in  which  punched  card  equipment  is  used 
for  coordinate  search: 

"There  is  at  the  Center,  then,  a  growing  file  of  punched 
cards  which  can  be  searched  mechanically  for  variables. 

Although  a  single  criterion  can  be  looked  for,  such  as 
a  test  organism  or  a  manner  of  administration,  it  is  in 
the  facility  for  search  of  combinations  of  ideas  that 
this  method  affords  a  major  advantage  over  conventional 
indexing.  Thus,  all  compounds  tested  for  a  specific 
response  from  a  given  organism  or  group  of  related 
organisms  under  any  of  the  usual  variable  conditions  of 
testing  can  be  selected  from  all  other  compounds  not 
meeting  those  specifications. " 


Cherenin,  V.  P. ,  "Certain  Problems  of  Documentation  and  Mechanization 
of  Information  Search,"  (mimeographed  translation),  Moscow,  1955. 

This  generalized  discussion  of  documentation  problems  contains  a  full 
account  of  search  by  the  logical  intersection,  sum,  and  complement  of 
classes.  It  Indicates  that  the  ability  to  perform  such  searches  is 
the  basis  of  mechanizing  information  search.  However,  the  paper  also 
concludes  that  there  is  a  requirement  for  a  special  "machine  language" 
which  will  supply  grammatical  and  syntactical  relationships  of  terms 
in  addition  to  the  Boolean  operations  which  form  the  baslB  of  machine 
search. 


De  Orolier,  Eric,  "Method  for  the  Retrospective  Searching  of 
Scientific  Documents:  A  Preliminaiy  Report,"  Paris,  Unesco, 
August  24,  1955. 

"De  Orolier  was  commissioned  to  prepare  this  paper  by 
the  Unesco  International  Advisory  Committee  for  Docu¬ 
mentation  and  Terminology  in  Pure  and  Applied  Science 
in  consultation  with  the  committee  secretariat.  Its 
major  sections  discuss  the  scope  of  the  problem;  retro¬ 
spective  searching  from  the  user’s  standpoint;  methods 
of  facilitating  the  retrieval  of  documents  (classifi¬ 
cation,  indexing,  filing,  automatic  selection  and 
codes);  and  organizational  considerations  affecting 
the  choice  and  utilization  of  methods.  Contains 
bibliographies  following  each  section  and  sub- section 
of  the  report." 

(AD  Abstract] 


Dunham,  B, ,  "The  Formalization  of  Scientific  Languages,  Part  1.  The 
Work  of  Woodger  and  Hull."  IBM  Journal  of  Research  and  Development, 
Vol.  1,  No.  4,  October  1957,  pp.  341  -  347. 

"The  problem  of  language  structure  in  the  mechanical  storage 
and  retrieval  of  information  is  discussed.  The  "formal¬ 
ization"  of  language,  as  attempted  by  Woodger  and  HUH, 
is  examined  as  a  solution  to  the  problem  of  language 
structure  in  mechanical  operations." 

(fi£>  Abstract) 
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Fairthorne,  R.  A. ,  "Algebraic  Representation  of  Storage  and  Retrieval 
languages, "  Proceedings  of  the  International  Conference  on  Scientific 
Information,  Washington,  NAS-NRC,  1959,  pp.  1513  -  1326. 

"This  paper  has  outlined  a  possibly  useful  method  of  representation 
in  which  vocabularies  and  hierarchies  of  vocabularies  are  regarded 
as  the  sum,  in  any  consistent  sense,  of  repetitive  dyadic  vocabu¬ 
laries,  not  necessarily  clerically  realizable.  It  generalizes  and 
unifies  many  special  methods,  such  as  various  algebraic  identities 
used  traditionally  to  demonstrate  number  representations,  and  models 
used  in  investigating  some  properties  of  ordinary  language.  Here  it 
can  be  used  to  discover  the  sympathetic  magic  principles  (like- 
produces-like)  characteristic  of  linguistic  systems  and  to  apply 
them  usefully  to  more  systematic  vocabularies.  We  have  seen  that  to 
some  extent  it  can  cope,  though  not  simultaneously,  with  additive 
and,  in  general,  modular  properties  such  as  cost  and  selective 
information,  and  with  the  partial  orderings  and  looser  generalized 
operations,  synonymity  and  homonymity,  that  are  essential  to  re¬ 
trieval." 

(Author's  Abstract] 


Fairthome,  R.  A.,  "Automata  and  Information,"  Journal  of  Docu¬ 
mentation,  Vol.  6,  September  1952,  pp.  164  -  172. 

This  paper  is  included  because  of  its  clear  account  of  automata  and 
their  possible  application  to  the  storage  and  retrieval  of  infor¬ 
mation  in  libraries.  Fhirthome  is  clear  that  the  automata  devices 
which  can  operate  on  physical  strings  of  information  and  as  such, 
can  perform  only  a  clerical  or  engineering  function  in  a  library, 
can  contribute  to  the  information  problem  by  manipulating  "tags"  in 
an  index. 


Fairthorne,  R.  A.,  "Delegation  of  Classification,"  American  Docu¬ 
mentation,  Vol.  EC,  No.  3,  July  1958,  pp.  159  -  164. 

Ry  "delegation  of  classification,"  Fairthorne  apparently  means  a 
scheme  whereby  the  correct  assignment  of  classes  can  be  performed 
by  clerks  or  by  people  not  directly  concerned  with  making  or  in¬ 
terpreting  the  classification.  He  thinks  such  delegation  is 
important  because: 

"It  would  be  local  in  time  as  well  as  space,  because  no 
librarian  acts  as  his  own  classifier  longer  than  his 
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terra  of  office  or  his  tens  of  life,  whichever  may  be 
the  shorter.  If  you  have  to  do  everything  yourself, 
including  classification  and  retrieval,  you  do  not 
have  to  know  how  to  do  it,  you  only  have  to  be  able 
to  do  it.  For  delegation  we  have  to  construct  rules 
for  making  non- contradictory  decisions  about  rele¬ 
vance.  " 


A  discussion  of  the  library  problem  in  terms  of  information  theory. 
Fairthorne  distinguishes  clearly  between  semantic  problems  which 
occur  at  the  level  of  the  selection  of  terms  or  the  understanding 
of  a  question  put  to  the  system  and  the  problem  of  organizing  the 
physical  elements  of  a  library,  be  they  documents,  terns,  or  codes. 
Just  as  Shannon  and  Weaver  point  out  the  irrelevance  of  semantic 
problems  to  the  type  of  information  with  which  communication  theory 
is  concerned,  so  Fairthorne  points  out  the  irrelevance  of  semantic 
problems  to  the  design  of  information  systems  for  libraries: 

"The  semantics  of  any  library  activity  can  be  settled  by 
practical  study;  and  intelligent  anticipation*  of  clients 
behaviour  in  bibliographical  situations.  Syntactic 
problems  of  coding,  and  pragmatic  problems  of  matching 
the  codings  to  operations  such  as  marshalling,  selecting, 
and  siteing  of  documents  and  tallies  can  then  be  con¬ 
sidered  aB  clear-cut  questions  of  cutting  down  average 
time  and  labour  and  cost  of  concrete  tasks." 

Fairthorne  is  also  clear  that  a  system  of  indexing  can  be  interpreted 
as  a  Boolean  algebra  and  that  the  problem  of  mechanization  becomes 
one  of  finding  the  most  efficient  code  and  the  most  efficient  in¬ 
struments  for  manipulating  such  codes. 

"The  cost  and  trouble  of  working  clerical  systems  [information 
systems]  depends  not  on  what  we  say  in  them,  but  how  we  say 
it....  No  philosophical  issues  are  involved  at  this  level 
of  library  action.  Information  Theory  can  unify  and 
generalize  much  empirical  knowledge  in  this  field." 
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Ihirthorne,  Robert  Arthur,  "The  Mathematics  of  Classification,"  Proc. 
Brit.  Soc.  for  International  Bibliography,  9,  October  14,  1947,  pp. 

35  -  42, 

Library  classification,  especially  the  U.  D.  C.,  is  presented  as  a 
particular  and  limited  application  of  the  algebra  of  classes.  The 
limitation  follows  from  the  emphasis  in  libraiy  classification  upon 
the  relationship  of  inclusion.  The  advantage  of  an  exclusive  con¬ 
cern  with  a  single  relationship  is  that  it  permits  numerical  coding 
of  the  relationship  and,  hence,  the  linear  organization  of  materials 
on  shelves.  If  classification  is  considered  to  be  more  than  a  de¬ 
vice  for  arranging  materials  on  shelves,  it  becomes  apparent  that 
classes  may  have  to  one  another  other  relationships  than  the  re¬ 
lation  of  inclusion.  Fairthome  polntB  out  that  the  subject  which 
discusses  the  relationship  of  classes  is  Boolean  algebra  and  he  in¬ 
dicates  that  library  classification  can  profit  from  using  other  re¬ 
lations  described  in  a  Boolean  algebra.  He  points  out  that  Boolean 
algebra  "is  the  algebra  governing  networks  of  switches;  that  is, 
the  controls  of  computing  machinery,  among  other  things. ...  As  the 
application  of  such  apparatus  to  libraries  is  an  urgent  topic,  the 
consistency  of  the  laws  governing  these  machines  with  those  govern¬ 
ing  classification  systems  need  careful  consideration,  which  I  am 
not  going  to  give  in  this  paper,  through  lack  of  time  and  competence. 
But  it  will  have  to  be  done,  if  complicated  machinery  is  not  to  be 
misapplied."  Fairthorne  recognizes  one  difficulty  in  the  application 
of  the  algebra  of  classes  to  problems  of  library  classification.  One 
of  the  elementary  operations  of  the  algebra  of  classes  is  negation 
and  the  use  of  negation  as  an  operation  in  information  retrieval 
presents  very  serious  problems. 


Farradane,  J.  E.  L.,  "A  Scientific  Theory  of  Classification  and 
Indexing  and  Its  Practical  Applications,"  Journal  of  Documentation, 
Vol.  6,  June  1950,  pp.  83  -  99. 

Although  this  paper  is  primarily  concerned  with  a  new  approach  to  the 
classification  of  knowledge,  it  approaches  this  problem  by  defining 
what  it  calls  "isolates"  and  "operators."  An  isolate  is  a  term  or 
subject  and  an  operator  is  a  connection  between  terms  or  subjects. 
Farradane  calls  these  operators  "logical  operators"  but  in  specifying 
their  nature  he  describes  them  as  "expressing  appurtenance,  equiva¬ 
lence,  reaction,  and  causation. "  Prima  facie,  reaction  and  causation 
are  not  logical  operators  but  terms  used  to  describe  factual  re¬ 
lations.  Appurtenance,  which  means  property,  might  be  considered  a 
logical  relation  within  a  predicate  logic,  but  it  is  not  a  relation¬ 
ship  within  the  algebra  of  classes.  As  for  equivalence,  Farradane 
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rules  out  equivalence  as  a  relation  of  an  item  to  Itself  and  re¬ 
stricts  it  only  to  the  part-whole  relationship,  which  is  also  inter¬ 
preted  factually,  rather  than  as  a  logical  relationship  of  class 
inclusion.  In  terms  of  the  above,  it  might  he  supposed  that 
Farradane' s  work  was  irrelevant  to  the  development  of  coordinate 
Indexing,  hut  actually  it  represents  an  early  effort  to  insist  that 
in  an  Indexing  system,  it  was  necessary  to  use  more  than  logical 
relations  to  express  factual  relationships  among  terms.  This  re¬ 
lates  his  work  to  subsequent  work  on  roles  and  links,  which  function 
as  empirical  connections  among  terms  of  an  indexing  system. 


Farradane,  J.  E.  L.,  "A  Scientific  Theory  of  Classification  and 
Indexing:  Further  Considerations,'1  Journal  of  Documentation,  Vol. 

8,  June  1952,  pp.  73  -  92. 

This  paper,  which  departs  still  further  from  either  a  theoretical  or 
a  practical  concern  with  indexing  and  information  systems,  is  a  con¬ 
tinuation  of  "A  Scientific  Theory  of  Classification  and  Indexing  and 
Its  Practical  Applications."  Even  more  than  the  previous  paper,  it 
confuses  problems  of  indexing  systems  with  extraneous  philosophical 
considerations,  with  child  psychology,  epistemology,  and  theories  of 
perception.  Further,  Farradane  has  no  conception  of  the  type  of 
logical  and  mathematical  considerations  which  are  relevant  to  the 
indexing  and  information  problem  and  which, have  been  set  forth  so 
well  by  Fairthorne.  The  problem  of  organizing  information  in  a 
library  is,  as  Fairthorne  has  pointed  out,  an  engineering  problem 
and  not  a  philosophical  problem. 


Francisco,  R.  L.,  "Use  of  the  Uniterm- Coordinate  Index  System  in  a 
Large  Industrial  Concern."  Presented  before  the  Metals  Division, 
Special  Libraries  Association,  Philadelphia,  October  20,  1955. 

"The  Uniterm  .System  has  been  used  for  more  than  a  year  and 
a  half  in  indexing  technical  reports  in  the  Technical  Data 
Center  of  the  General  Electric  Company,  a  Center  which  has 
over  150,000  reports  in  its  files.  The  System  is  described 
and  two  pitfalls  are  discussed;  namely,  the  importance  of 
avoiding  the  use  of  synonyms,  which  will  tend  to  lose 
Information,  and  also  the  necessity  of  avoiding  terms  so 
general  that  they  encompass  the  entire  library.  In  avoid¬ 
ing  the  use  of  synonyms,  it  is  helpful  to  keep  a  dictionary 
of  terms  used.  The  author  states  that  they  have  never  yet 
encountered  any  problem  with  'false  drops.' 1 
[AD  Abstract] 
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Gamble,  D.  T.,  "A  Coordinate  Index  of  Organic  Compounds/'  (presented 
before  the  Division  of  Chemical  Literature,  127th  Meeting  of  the 
American  Chemical  Society,  Cincinnati,  Ohio),  March  31,  1955. 

The  major  interest  of  this  paper  lies  in  the  fact  that  it  is  one  of 
the  first  concrete  demonstrations  of  the  convertibility  of  con¬ 
ventional  and  inverted  systems.  The  paper  describes  an  experiment 
in  which  3,000  compounds  were  broken  down  in  a  manner  similar  to 
that  established  by  the  Chemical-Biological  Coordination  Center, 
but  the  data  ware  entered  on  Uniterm  cards.  This  produced  a  file 
of  263  cards  representing  the  terms  in  the  system,  on  which  the 
3,000  compounds  were  posted.  Like  the  CBCC  system,  the  coordinate 
index  described  in  this  paper  permitted  only  searches  by  logical 
functions  of  the  terms,  and  hence  some  noise  might  result  in  a 
search  for  compounds  having  a  particular  arrangement  of  the  function¬ 
al  groups  described  by  the  terms  in  a  search.  Although  the  author 
used  Uniterm  cards,  he  does  point  out  that  an  identical  system 
could  be  established  using  Peek-a-boo  cards.  Here  again  is  one  of 
the  early  realizations  of  the  logical  identity  of  different  mecha¬ 
nisms  used  in  coordinate  search. 


Garfield,  Eugene,  "Preliminary  Report  on  the  Mechanical  Analysis 
of  Information  by  Use  of  the  101  Statistical  Punched  Card  Machine," 
American  Documentation,  Vol.  V,  Ho.  1,  January  1354,  pp.  7"  -  12. 

The  Welch  Medical  Library  Indexing  Project,  sponsored  by  the  Armed 
Forces  Medical  Library  (now  the  National  Library  of  Medicine),  was 
concerned  with  the  study  of  the  general  problem  of  indexing  the 
world's  medical  serials  and  with  developing  techniques  for  both 
mechanized  search  and  mechanized  print-out  of  indexes,  Garfield's 
paper  is  based  upon  his  participation  in  the  Welch  Medical  Library 
Indexing  Project.  The  Project  utilized  an  IBM  101  and  by  ingenious 
wiring  of  this  device,  Garfield  was  able  to  conduct  searches  by  a 
many-term  question  with  any  specified  Boolean  function  relating 
the  terms.  What  Garfield  achieved  was  a  form  of  superimposed  wiring, 
logically  equivalent  to  the  superimposed  coding  developed  by  Calvin 
Mooers.  It  is  known  in  the  art  that  superimposed  coding  requires 
random  codes  and  the  Welch  Medical  Library  Indexing  Project  used 
generic  codes.  With  generic  codes  it  is  impossible  to  calculate  in 
any  meaningful  sense  the  percentage  of  false  drops  which  will  occur 
from  any  given  degree  of  superimposition.  Perhaps  this  is  why 
Garfield  does  not  discuss  this  problem  in  his  paper. 


Hemer,  Saul,  mad  Meyer,  Robert,  "Classifying  and  Indexing  for  the 
Special  Library,"  Science ,  Vol.  125,  No.  3252,  April  26,  1957,  pp. 

799  -  803. 

The  authors  propose  to  correct'  the  recognized  disadvantages  of  general 
classification  systems,  not  through  the  use  of  coordinate  Indexing, 
but  by  preparing  special  classifications  designed  for  special  purposes 
and  special  groups  of  users.  A  description  of  such  a  special  classi¬ 
fication  is  given. 


Holmstrom,  Dr.  J.  E. ,  "A  Classification  of  Classifications/'  (Paper 
presented  at  Berne  Conference  of  International  Federation  for  Docu¬ 
mentation,  1947,  and  published  in  its  report;  since  slightly  amended). 
The  Royal  Society  Scientific  Information  Conference,  21  June  -  2  July 
1948,  Report  and  Papers  Submitted,  London,  The  Royal  Society,  1948, 
pp.  501  -  515. 

This  paper  is  one  of  the  earliest  to  distinguish  between  library 
classification,  alphabetical  subject  heading,  and  what  Holmstrom 
calls  mechanical  selection.  Although  Holmstrom  recognizes  that 
mechanical  devices  present  "the  possibility  of  sifting  out  from  an 
accumulated  mass  of  records,  at  any  time,  any  desired  conjunctions 
[Italics  his]  of  information",  he  does  not  see,  as  did  Fairthorne, 
that  the  conjunction  is  essentially  a  product  relation  as  described 
in  the  algebra  of  classes  and  that  it  is  only  one  of  a  number  of 
possible  logical  functions  of  classes.  Holmstrom  discusses  two 
types  of  apparatus  for  achieving  conjunctions,  namely,  manual  key- 
sort  and  the  Batten  system,  which  uses  optical  matrices.  These  de¬ 
vices  can  handle  logical  products  much  more  readily  than  they  can 
handle  logical  sums  or  other  types  of  logical  functions.  This  may 
explain  Holmstrom' s  exclusive  emphasis  on  conjunction. 


Jonker,  Frederick,  "The  Descriptive  Continuum:  A  'Generalized' 
Theory  of  Indexing,"  Washington,  APOSR  TN  57-287,  June  1957.* 

"The  generalized  theory  of  indexing  postulated  in  this 
article. . .  looks  upon  all  indexing  systems  as  a  con¬ 
tinuum,  the  descriptive  continuum.  The  main  parameter 
of  this  continuum  is  the  average  length  of  the  'entries* 


*  This  paper  also  appeared  in  Proceedings  of  the  International 
Conference  on  Scientific  Information/ (HovT 16-21,  1958) 
Washington,  NAS-NRC,  1959. 
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or  'headings'  used.  At  one  end  of  the  continuum  or 
'spectrum'  is  keyword  Indexing;  subject  heading  indexing 
Is  somewhere  in  the  middle,  while  hierarchic  classifi¬ 
cations  are  at  the  other  extreme.  The  average  length 
of  the  headings  or  descriptive  terms  used  determines 
the  position  in  the  continuum. 

"Throughout  the  continuum,  all  other  parameters  behave 
as  functions  of  the  average  tern-length.  Some  of  these 
parameters  are: 

—Potential  depth  of  indexing 

— Permutabi.1  tty  of  indexing  criteria 

—Degree  of  hierarch!  cal  definition  of  indexing 

--Potential  need  for r&  coordinating  mechanism 

-—Retrieval  noise 

—Size  of  the  access  apparatus 

—False  coordinations 

—Capacity  for  handling  semantic  indeterminacy. 

"These  parameters  are  discussed  and  explained.  They 
are  believed  to  contain  all  the  considerations  basic 
to  the  indexing  problem. 

"The  theory  indicates  that  once  the  main  parameter, 
average  term  length,  is  determined,  all  other 
properties  of  the  indexing  system  are  fixed.  For 
every  information  collection  there  is  an  'optimum' 
position  in  the  continuum,  according  to  which  the 
collection  should  be  organized.  This  optimum 
position  is  determined  by  the  diffuseness  of  the 
information  in  that  particular  field." 

[Author's  Abstract] 


Joyce,  T. ,  and  Needham,  R.  M. ,  "The  Thesaurus  Approach  to  Infor¬ 
mation  Retrieval,"  American  Documentation,  Vol.  IX,  No.  3,  July 
1958,  pp.  192  -  197. 

The  use  of  a  thesaurus  is  recommended  as  providing  retrieval  not 
merely  on  a  yes-or-no  basis  but  in  terms  of  degrees  of  relevance. 
The  degrees  of  relevance  are  presumably  set  up  by  setting  up  re¬ 
lationships  among  terms  in  the  thesaurus.  So  far  as  the  operation 
of  retrieval  is  concerned,  it  still  proceeds  on  a  yes-or-no  basis. 


King,  Gilbert  S. ,  "Applications  of  Punched-Card  Methods  to  Scientific 
Computations,"  Punched  Cards,  Their  Applications  to  Science  and 
Industry,  edited  by  Robert  S.  Casey  and  James  W.  Perry,  New  York, 
Beinhold  Publishing  Coloration,  1951,  pp.  407  -  422. 

Although  this  paper  is  primarily  concerned  with  handling  scientific 
data  and  scientific  computations  on  punched  card  equipment,  it  is, 
as  has  been  noted  in  the  text,  one  of  the  earliest  papers  to  recog¬ 
nize  explicitly  that  an  understanding  of  machine  possibilities  could 
be  derived  from  a  study  of  symbolic  logic. 

"The  machine  carries  out  operations  of  symbolic  logic, 
and  for  proper  coding  and  programming  the  scientist, 
should  know  the  basic  algebra  of  the  black  boxes  into 
which  he  will  put  cards." 

Although  Dr.  King  does  not  expand  on  this  point,  it  is  clear  that 
if  the  machine  operates  by  performing  logical  functions,  a  search 
should  be  designed  in  terms  of  such  logical  functions. 


Luhn,  fl.  P.,  "The  Automatic  Creation  of  Literature  Abstracts,"  IBM 
Journal  of  Research  and  Development,  Vol.  2,  No.  2,  April  1958,  pp. 
159  -  165.  '■ 

"Excerpts  of  technical  papers  and  magazine  articles  that  serve 
the  purposes  of  conventional  abstracts  have  been  created  en¬ 
tirely  by  automatic  means.  In  the  exploratory  research  de¬ 
scribed,  the  complete  text  of  an  article  in  machine-readable 
form  is  scanned  by  an  IBM  704  data-processing  machine  and 
analyzed  in  accordance  with  a  standard  program.  Statistical 
information  derived  from  word  frequency  and  distribution  is 
used  by  the  machine  to  compute  a  relative  measure  of  sig¬ 
nificance,  first  for  individual  words  and  then  for  sentences. 
Sentences  scoring  highest  in  significance  are  extracted  and 
printed  out  to  become  the  ’auto-abstract'." 

[Author's  Abstract] 
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Iuhn,  H.  P. ,  "The  Automatic  Derivation  of  Information  Retrieval 
Encodements  from  Machine-Readable  Texts,"  Yorktown  Heights,  N.  Y., 
International  Business  Machines  Corporation,  September  6,  1959,  9  p 

"The  problem  of  organizing  scientific  and  technical 
literature  for  purposes  of  information  retrieval  may¬ 
be  alleviated  either  by  (1)  the  adoption  of  a  common 
machine  coding  language  to  avoid  conflicts  and  dupli¬ 
cation  or  (2)  the  development  of  completely  automatic 
methods  for  deriving  clues  to  document  contents.  In 
arguing  for  the  latter  course,  this  paper  reviews 
some  of  the  drawbacks  of  the  common- language  approach 
and  suggests  several  approaches  to  automation  based 
on  statistical  methods.  These  include  the  derivation 
of  similarity  coefficients  from  word- frequency  lists 
and  indexing  by  the  automatic  printout  of  key  words 
in  their  context." 

[Author’s  Abstract] 


Iuhn,  H.  P. ,  "General  Rules  for  Creating  Machinable  Records  for 
Libraries  and  Special  Reference  Files,"  Yorktown  Heights,  N.  Y., 
International  Business  Machines  Corporation,  September  30,  1959, 

8  p. 


"The  purpose  of  establishing  general  rules  for  creating 
machinable  records  for  libraries  and  special  reference 
files  is  to  insure  the  greatest  possible  utility  of  a 
basic  record  in  the  many  phases  of  documentation, 
library  services,  the  storage,  dissemination  and  re¬ 
trieval  of  information.  If,  at  the  first  instant  of 
accession  of  a  document,  a  duly  complete  machinable 
record  thereof  is  manually  produced,  this  should  be 
the  last  time  that  human  effort  is  expended  in  view 
of  the  facility  with  which  modern  information  pro¬ 
cessing  devices  can  adapt  this  information  to  all 
subsequent  requirements  that  might  arise. 

"In  order  that  such  a  master  record  fulfill  these  re¬ 
quirements,  its  format  must  be  substantially  unbiased 
with  respect  to  any  specific  preconceived  system  it 
is  to  serve.  The  adoption  of  a  standard  format  of  an 
original  record  within  an  organization  and,  possibly 
among  organizations,  will  permit  the  exchange  of  such 
records,  yielding  additional  savings  in  effort,,  and 


will  Insure  coropJ.ete  freedom  in  devising  individual 
systems  most  suitable  for  given  situations.  Also, 
in  those  cases  where  systems  have  become  inadequate 
or  did  not  perform  as  expected,  the  modification  or 
redesign  of  such  systems  may  be  carried  out  without 
renewed  manual  effort  since  new  types  of  records 
can  always  be  derived  from  the  master  records  by 
automatic  means.  Finally ,  work  on  the  creation  of 
a  collection  of  records  may  safely  be  started  long 
before  a  particular  system  for  processing  them  has 
been  established. " 

[From  the  text] 


luhn,  H.  P.,  "The  IBM  Electronic  Information  Searching  System,"  IBM 
Research  Center,  Yorktovn  Heights,  New  York,  February  15,  1952. 

This  paper  is  one  of  the  earliest  and  most  thorough  attempts  to  de¬ 
scribe  a  completely  mechanized  searching  system.  It  covers  coding 
problems,  input  problems,  and  searching  problems,  and  describes  a 
special  machine  which  has  come  to  be  known  in  the  art  as  the  uihn 
Scanner.  The  Luhn  Scanner,  by  its  use  of  complementary  coding  and 
the  movement  of  the  cards  being  searched  against  the  question  card, 
eliminated  the  requirement  for  fixed  field  coding  and  provided  for 
searching  by  the  three  Boolean  operations  of  product,  sum,  and 
complement.  The  machine  could  also  be  instructed  to  search  by  any 
sub- set  of  terms  without  specifying  the  terms  included  in  the  sub¬ 
sets.  The  paper,  "A  New  Method  of  Recording  and  Searching  Infor¬ 
mation,"  constitutes  the  appendix  to  this  paper. 


Luhn,  H.  P. ,  "IBM  Punched  Card  System  for  Indexing  and  Classifying  ^ 
Information  and  Method  of  Searching  and  Analyzing  Such  Information, 
March  25,  1948. 

This  unpublished  paper,  prepared  by  Mr.  Luhn  in  March  of  1948,  is  an 
early  statement  of  the  theory  of  coding  and  searching  which  was  re¬ 
duced  to  practice  in  the  Luhn  Scanner  and  in  several  earlier  proto¬ 
types.  Mr.  Luhn  was  concerned  with  developing  a  method  of  coding 
which  would  permit  several  terms  to  be  coded  on  the  same  card  without 
the  requirement  that  the  terms  be  placed  on  the  card  and  searched  for 
in  any  specific  order.  He  was  also  concerned  with  the  ability  to 
search  for  material  indexed  by  any  combination  of  terms  on  the  card. 
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Although  the  details  of  coding  are  not  given,  the  coding  is  stated 
to  provide  economical  storage  of  data  along  with  the  possibility  of 
print-out,  sorting,  and  other  operations  carried  out  by  standard 
IBM  machines.  Although  Mr.  Luhn  recognized  the  importance  of  free 
indexing,  he  did  indicate  that  hiB  coding  system  could  provide  for 
generic  relations  among  the  terms  of  the  system. 


Luhn,  H.  P. ,  "The  IBM  Universal  Card  Scanner  for  Punched  Card  Infor¬ 
mation  Searching  Systems,"  Yorktown  Heights,  N.  Y.,  International 
Business  Machines  Corporation,  November  17,  1958,  24  p. 

"An  electronic  machine  that  answers  the  requirements  of 
information  retrieval  by  scanning  of  punched  cards  has 
been  developed  by  IBM.  This  machine,  called  the 
•Universal  Card  Scanner’  (UCS),  scans  cards  fed  through 
it  in  a  manner  similar  to  that  employed  on  conventional 
punched  card  sorters.  It  Is  capable  of  discovering 
whether  any  one  or  several  of  a  given  set  of  patterns 
are  wholly  or  partly  contained  in  any  of  the  record 
cards  scanned.  This  function  is  performed  by  a  'no- 
pulse  matching'  process  under  the  control  of  a  'question 
card'  which  contains  prototypes  of  the  patterns  sought, 
likewise  represented  by  punched  holes."  This  is  the 
adaptation  of  an  electronic  method  to. the  optical 
principle  of  'matching  by  black-out',  employed  in  an 
earlier  experimental  IBM  card  scanning  machine,  frequent¬ 
ly  referred  to  as  the  'Luhn  Scanner’.  As  was  the  case 
in  the  earlier  model,  the  present  machine  features  the 
use  of  a  punched  IBM  card  (Question  Card)  for  furnishing 
the  patterns  to  be  searched  for  in  a  record  file. " 

[From  the  text) 


Luhn,  H.  P.,  "Identification  of  Geometric  Patterns  by  Topological 
Description  of  their  Envelopes,"  Poughkeepsie,  N.  Y.,  International 
Business  Machines  Corporation,  April  23,  1956,  6  p. 

"The  use  of  electronic  equipment  in  the  field  of  literature 
searching  and  of  correlation  of  information  has  pointed  up 
the  need  of  linear  notations  for  multidimensional  repre¬ 
sentations  such  as  chemical  structures.  This  paper  proposes 
a  notation  for  the  identification  of  geometric  patterns  such 
as  used  for  representing  chemical  structures.  Hie  notation 
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is  based  on  a  topological  description  of  the  envelope 
enclosing  such  configurations.” 

[Author’s  Abstract] 


Luhn,  H.  P. ,'  "Keyword- In- Context  Index  for  Technical  literature 
(KWIC  Index),"  Yorktown  Heights,  N.  Y. ,  International  Business 
Machines  Corporation,  August  31,  1959,  16  p. 

"A  distinction  is  made  between  bibliographical  indexes 
for  new  and  paBt  literature  based  on  the  willingness 
of  the  user  to  trade  perfection  for  currency.  Indexes 
giving  keywords  in  their  context  are  proposed  as  suita¬ 
ble  for  disseminating  new  information.  These  can  be 
entirely  machine-generated  and  hence  kept  up-to-date 
with  the  current  literature.  A  compatible  coding 
scheme  to  identify  the  indexed  documents  is  also  pro¬ 
posed.  In  it  elements  are  automatically  extracted 
from  the  usual  identifiers  of  the  document  so  that 
the  coded  identifier  yieldB  a  maximum  of  information 
while  remaining  susceptible  to  normal  methods  of 
ordering. " 

[Author's  Abstract] 


Luhn,  H.  P. ,  "A  New  Method  of  Recording  and  Searching  Information,” 
(Presented  at  American  Chemical  Society  Meeting  September  11,  1951). 
American  Documentation,  4,  January  1953,  pp.  14  -  16. 
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By  the  use  of  overlapping  circles,  ,  Mr.  Luhn  illustrates  how  any 
topic  can  be  identified  by  a  set  of  terms  and  how  related  topics 
which  are  indexed  by  sub- sets  of  the  terms  can  be  indicated.  The 
new  method  which  involves  both  indexing  and  retrieving  by  "a 
plurality  of  aspects,”  is  contrasted  with  conventional  methods  of 
Indexing  and  classifying.  Luhn  also  points  out  the  manner  in  which 
a  relatively  small  set  of  terms  can,  in  combination,  describe 
uniquely  millions  of  diverse  topicB.  Finally,  Luhn  proposes  that 
by  varying  the  number  of  terms  used  in  a  search,  a  search  can  be 
made  as  generic  or  as  specific  as  one  pleases. 
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Luhn,  H,  F. ,  "Potentialities  of  Auto-Encoding  of  Scientific  Liter¬ 
ature/'  Yorktovn  Heights,  N.  Y.,  International  Business  Machines 
Corporation,  May  1959,  22  p. 

"The  introduction  of  mechanical  devices  for  the  processing  *  •' 

of  scientific  information  raises  the  question  as  to  the 
extent  to  which  machines  will  be  able  to  assist  in  the 
selection,  storage,  dissemination  and  retrieval  of  infor¬ 
mation.  In  order  to  appreciate  fully  the  functions  that 
information  processing  machines  are  capable  of  performing 
in  this  area  a  nuiriber  of  typical  operations  are  presented 
and  their  potential  usefulness  to  the  development  phase 
as  well  as  operational  phases  of  information  systems  is 
explored.  The  solution  of  particular  problems  is  illus¬ 
trated  by  way  of  examples  based  on  the  availability  of 
scientific  literature  in  machine- readable  form.  The 
examples  cover  the  compilation  of  word  lists,  establish¬ 
ment  of  word  relationships,  the  preparation  of  word 
patterns  for  retrieval  and  compilation  of  dictionaries 
and  thesauri.  Some  of  the  results  of  Information 
Retrieval  Research  at  the  IBM  Research  Center  are 
presented  in  the  form  of  machine  print-outs  such  as 
the  keyword- in- context  index  for  bibliographies,  the 
auto-abstract,  the  word  pair  matrix,  derived  code  words, 
and  the  statistical  analysis  of  a  document." 

[Author's  Abstract] 


Luhn,  H.  P.,  "Row-by-row  Scanning  Systems  for  IBM  Punched  Cards  as 
Applied  to  Information  Retrieval  Problems,"  Yorktown  Heights,  N.  Y., 
International  Business  Machines  Corporation,  May  1959,  37  p. 

"'The  row-by-row  method  of  recording  obviates  the  need  of 
superimposed  coding  and  overcomes  the  disadvantages 
previously  enumerated.  Alphabetic  or  numeric  infor¬ 
mation  may  be  spelled  out  by  character  and  may  there¬ 
fore  be  uniquely  matched  during  the  scanning  process, 
thereby  eliminating  incidences  of  false  selection.  It 
is  furthermore  possible  to  express  relationships 
amongst  recorded  items  in  many  ways,  to  indicate  ranges 
of  values,  alternative  conditions  and  many  other 
features.  Also,  the  recorded  information  may  always 
be  recovered,  which  is  not  possible  in  superimposed 
coding  schemes'.  Provides  details  on  row- by- row 
coding  Bchemes,  typical  scanning  codes  and  their 
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assembly  into  rows,  and  preparation  of  cards*  Describes 
the  IBM  101  Electronic  Statistical  Machine  with  row-by-rov 
attachment  and  gives  examples  of  applications." 

[AD  Abstract] 


Duhn,  H.  P.,  "Selective  Dissemination  of  New  Scientific  Information 
with  the  Aid  of  Electronic  Processing  Equipment,"  Yorktown  Heights, 
N.  Y. ,  International  Business  Machines  Corporation,  November  30, 
1959,  19  p. 

"Improvement  of  scientific  communication  is  sought  through 
machine  assisted  dissemination  of  new  Information.  A 
service  system  is  described  in  which  a  new  document  is 
characterized  by  a  vocabulary  or  pattern  of  keywords. 

This  pattern  is  then  compared  with  the  vocabularies  or 
profiles  characterizing  each  of  the  participants  of  the 
service.  If  a  given  degree  of  similarity  exlstB  be¬ 
tween  the  two,  the  affected  participants  are  notified 
by  a  card  carrying  an  abstract.  The  recipient  signifies 
whether  the  information  is  in  fact  relevant  of  not  by 
returning  or  not  returning  a  stub  provided  with  the 
card.  His  affirmative  response,  which  may  Include  his 
request  for  a  copy  of  the  document,  id  reflected  on  his 
profile  by  incorporating  the  pattern  of  the  accepted 
item.  Profiles  are  kept  current  by  discarding  patterns 
after  they  have  reached  a  certain  age.  The  feedback 
Includes  notification  of  authors  as  to  the  reception 
of  their  work.  The  service  also  facilitates  partici¬ 
pants’  referral  of  information  to  others  and  generally 
endeavors  to  promote  interchange  of  information  by 
personal  contact.” 

[Author’s  Abstract] 


Luhn,  H.  P.,  "A  Serial  Notation  for  Describing  the  Topology  of 
Multidimensional  Branched  Structures."  (Nodal  Index  for  Branched 
Structures),"  Poughkeepsie,  International  Business  Machines 
Corporation,  December  12,  1955. 

Many  users  of  coordinate  systems  have  felt  that  the  proper  encoding 
of  chemical  compounds  required  ordered  relationships  among  the  terms, 
as  contrasted  with  the  commutative  relationships  of  a  Boolean  algebra. 
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This  paper  is  one  of  the  earliest  descriptions  of  a  system  vhich 
imports  order  into  the  relationship  among  the  terms  of  the  system. 

"The  systems  refer  to  branched  and  interconnected  arrays 
of  linear  elements,  such  as  used  in  delineating  chemical 
structures,  the  flow  of  processes,  or  the  assembly  of 
mechanical  and  electrical  circuit  elements.  A  primary 
objective  of  these  systems  is  to  derive  a  linear  notation 
vhich  permits  the  discovery  of  inclusion  of  a  given 
structure  in  another  structure  by  serial  comparison  of 
the  elements  of  the  respective  notations  (serial  scanning).  ~ 
Such  serial  comparisons  may  be  performed  by  machines  with¬ 
out  the  aid  of  an  Internal  memory.  Another  objective  is 
the  derivation  of  a  unique  notation  for  purposes  of  identi¬ 
fication.  " 

[Author's  Abstract] 

A  similar  scheme  was  developed  at  the  Bureau  of  Standards  for  coding 
compounds  in  the  patent  literature. 

[Cf.  Ray,  L.  C. ,  and  Kirsch,  R.  A. ,  "Finding  Chemical  Records  by 
Digital  Computers"] 


Luhn,  H.  P. ,  "A  Statistical  Approach  to  Mechanized  Encoding  and 
Searching  of  Literaiy  Information,"  IBM  Journal  of  Research  and 
Development,  Vol.  1,  No.  4,  October  1957,  pp.  309  -  317. 

This  paper  is  largely  concerned  with  the  problem  of  setting  up  a 
vocabulary  of  "notions"  and  establishing  related  families  of  terms 
for  use  in  both  encoding  and  searching.  The  relationship  of  terms 
is  to  be  determined  by  a  statistical  study  of  and  use  with  one 
another. 


Maron,  M.  E.,  "Automatic  Indexing:  An  Experimental  Inquiry,"  Santa 
Monica,  The  Rand  Corporation,  August  10,  1960,  37  p. 

"This  inquiry  examines  a  technique  for  automatically 
classifying  (indexing)  documents  according  to  their 
subject  content.  The  task,  in  essence,  is  to  have 
a  computing  machine  read  a  document  and  on  the  basis 
of  the  occurrence  of  selected  clue  words,  decide  to 
vhich  of  many  subject  categories  the  document  in 
question  belongs. 
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"This  paper  describes  the  design,  execution  and  evaluation  of 
a  modest  experimental  study  aimed  at  testing  empirically  one 
statistical  technique  for  automatic  indexing." 

[Author’s  Summary] 


Mooers,  Calvin  N.,  "Choice  and  Coding  in  Information  Retrieval 
Systems,"  Transactions  of  the  I.  R.  E.  for  1954,  pp.  112  -  118. 


"Information  retrieval  systems  are  susceptible  to  treatment 
by  communication  theory  at  the  coding  and  machine  level, 
and  there  are  a  number  of  analogies  between  retrieval 
systems  and  multiplex  signalling  systems.  Historically, 
retrieval  theory  has  been  aided  by  communication  theory. 

In  the  other  direction,  there  is  reason  to  believe  that 
developments  —  both  theoretical  and  practical  -- 
originally  made  with  retrieval  systems  may  be  applicable 
to  the  development  of  signalling  systems.  For  instance, 
some  retrieval  practice  seems  to  be  ahead  of  work  in 
asynchronous  multiplex  signalling.  For  another  thing, 
techniques  for  handling  semantic  information  in  re¬ 
trieval  —  not  discussed  here  --  may  be  suggestive  for 
further  development  in  communication  theory  if  and  when 
such  matters  are  undertaken. " 

[Author’s  Abstract] 


Mooers,  Calvin  S. ,  "Coding,  Information  Retrieval,  and  the  Rapid 
Selector,"  American  Documentation,  Vol.  I,  No.  4,  October  1950,  pp. 
225  -  229. 

This  paper  by  Mooers  discusses  the  type  of  coding  suggested  in  the 
paper  by  Wise  and  Ferry,  "Multiple  Coding  and  the  Rapid  Selector." 
It  points  out  that  word  coding  does  not  result  in  random  codes  and 
therefore  the  number  of  false  drops  may  be  much  higher  than  ex¬ 
pected.  Mooers  also  objects  in  this  paper  to  th<?  use  of  the  terms 
"polydimensional"  or  "dimensional"  to  describe  coordinate  indexing 
on  the  grounds  that  the  terms  or  descriptors  in  an  indexing  system 
do  not  fall  In  definite  series  or  dimensions.  On  this  point,  see 
Taube,  Mortimer,  "The  Functional  Approach  to  Bibliographic  Organi¬ 
zation.  " 


tfooere,  Calvin  N. ,  "Comment  on  Bar-Klllel’s  'A  Logician's  Reaction 
to  Recent  Theorizing  on  Information  Search  Systems,'"  American  Docu¬ 
mentation!  Vol.  VIII,  No*  2,  April  1957,  pp*  114  -  116* 

A  reply  to  Mr.  Bar-Hillel's  "A  Logician's  Reaction  to  Recent  Theoriz¬ 
ing  on  Information  Search  Systems." 


Nboers,  Calvin  N.,  "Information  Retrieval  on  Structured  Content, " 
presented  at  the  Third  London  Symposium  on  Information  Theory, 
sponsored  by  the  Department  of  Electrical  Engineering  of  the 
Imperial  College  of  Science  and  Technology  and  held  at  the  Royal 
Institution,  September  12-16,  1955. 

This  paper  represents  Mooers*  contribution  to  the  discussion  of 
ordered  relations  among  terras  in  contrast  to  a  pure  coordinate 
Indexing  system.  Mooers  proposes  to  group  descriptors  into  inter¬ 
locking  sets,  which  he  calls  "n- tuples."  Each  such  set  involving 
a  number  of  terms  in  an  ordered,  relation  in  effect  constitutes  one 
term  which  can  be  coordinated  with  any  other  set. 


Mooers,  Calvin  N.,  "A  Mhthematic  Theory  of  language  Symbols  in  Re¬ 
trieval,"  Proceedings  of  the  International  Conference  on  Scientific 
Information,  Washington,  NAS-KRC,  1959,  pp. 1327  -  1564. ” 

"A  mathematical  model  is  presented  which  relates  the 
language  symbols  of  retrieval  to  the  documents  retrieved. 

The  model  is  applied  to  three  families  of  retrieval 
systems:  bhose  using  for  language  symbols  (l)  descriptors, 

(2)  characters  with  hierarchy,  and  (3)  characters  with 
logic.  Most  information  retrieval  systems  now  in  use  are 
variations  of  one  of  these  systems.  The  similarities  and 
differences  between  the  three  systems  are  displayed  by  the 
model.  According  to  the  model,  a  retrieval  prescription 
is  represented  by  a  point  in  a  space  P.  This  space  can  be 
generated  by  taking  the  cardinal  product  of  a  repertory  of 
simple  partially  ordered  systems.  The  output  of  the  re¬ 
trieval  system  is  a  subset  of  documents,  and  each  of  these 
is  represented  by  a  transformation  from  n  point  in  space  P 
to  a  point  in  space  L.  Two  different  retrieval  transfor¬ 
mations  are  defined.  Future  elaborations  and  extensions 
of  the  model  are  outlined. " 

[Author's  Abstract) 
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Jfooers,  Calvin  N. ,  "Zatocoding  Applied  to  Mechanical  Organization 
of  Knowledge,"  American  Documentation,  Vol»  II,  No.  1,  January  1951, 
pp.  ZO  -  32. 

In  addition  to  itB  description  of  the  Zator  Selector  and  Mooers' 
method  of  superimposed  coding,  this  paper  is  one  of  the  earliest 
presentations  of  the  method  of  indexing  by  separate  descriptors 
and  of  retrieving  information  by  searching  for  a  combination  of 
descriptors.  Because  Msoers  used  superimposed  coding  in  a  single 
field,  he  limited  his  description  of  searches  to  products  of  de¬ 
scriptors.  The  percentage  of  false  drops  in  the  system  went  down 
as  the  number  of  terms  in  a  question  went  up.  On  the  other  hand, 
any  question  involving  a  sum  of  descriptors  would  so  increase  the 
number  of  false  drops  as  to  make  the  system  unusable.  It  iB  per¬ 
haps  for  this  reason  that  Mooers  restricted  his  discussion  to 
product  searches. 


Morris,  J.  C. ,  "The  Duality  Concept  in  Subject  Analysis, "  American 
Documentation,  Vol.  V,  No.  3,  August  1954,  pp.  117  -  146. 

This  paper  is  a  defense  of  traditional  library  subject  headings 
analysis  against  the  claims  of  coordinate  indexing.  It  argues  that 
the  elimination  of  connectives,  word  order,'  and  grammatical  variants 
of  indexing  terms  and  the  exclusive  use  of  terms  and  Boolean 
functions  will  lead  to  the  scattering  of  information  in  an  index 
and  a  high  rate  of  noise  in  retrieval.  Mich  of  the  later  criticism 
of  pure  coordinate  indexing  followed  the  arguments  set  forth  in  this 
paper. 
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Morris,  J.  C.,  "Evolution  or  Involution?  Notes  Critical  of  the 
Uniterm  System  of  Indexing,"  J.  Cataloging  and  Classification,  ID, 
JUly  1954,  pp.  Ill  -  11B. 

"The  Uniterm  system  of  coordinate  indexing  is  an  innovation 
in  subject  analysis.  Some  of  the  features  inherent  in  the 
system  belie  the  claims  made  by  its  proponents,  particularly 
as  to  dependability  for  subject  retrieval  and  as  to  speed  or 
ease  of  searching.  Some  of  the  administrative  directives 
and  rules  for  setting  up  Uniterm  indexes  appear  almost 
certain  to  lead  to  superficial  rather  than  intensified 
subject  indexing.  These  two  factors  taken  together  indicate 
that  such  a  system  set  up  as  recommended  would  be  its  own 
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worst  enemy  and  would  be  self-defeating  in  the  long  run* 

[Author's  Abstract] 

The  author  points  out  that  the  four  major  devices  for  the  retrieval 
of  information  in  libraries  have  been  card  catalogs,  indexing 
systems,  classification  schemes,  and  subject  bibliographies.  These 
are  interdependent  and  each  has  met  "with  some  degree  of^success  the 
logical  criteria  of  subject  analysis  which  have  evolved. 

The  major  object  of  the  paper,  however,  is  to  review  critically  "state' 
meats  and  implications"  which  have  been  made  in  the  Installation 
Manual  for  the  Uniterm  System  of  Coordinate  Indexing  (Aimed  Services 
Technical  Information  Agency,  Dayton,  0. ,  1953). Such  matters  as 
matching  numbers,  the  basic  vocabulary  of  Uniterms,  the  density  of 
postings,  relationships  between  concepts  or  ideas,  and  the  "false- 
drop"  problem  are  discussed. 


Nolan,  J.  J. ,  "Information  Storage  and  Retrieval  Using  a  large  Scale 
Random  Access  Memory,"  (Presented  before  the  American  Chemical 
Society,  April  15,  1958).  American  Documentation,  Vol.  10,  No.  1, 
January  1959,  pp.  27  -  35. 

"Describes  the  application  of  the  IBM  305  RAMAC  as  an  infor¬ 
mation  retrieVal  tool,  demonstrating  how  random  entry  to 
such  a  large- capacity  memory,  and  the  associated  programmable 
features,  mayijnake  coordinate  searching  of  large  collections 
practical  by  offering  the  possibility  for  overcoming  such 
problems  as  the  recognition  of  specific-generic  relationships 
and  false  association  between  search  terms." 

[AD  Abstract] 


Opler,  Ascher,  "Dow  Refines  Structural  Searching,"  Chemical  and 
Engineering  News,  Vol.  35,  No.  33,  August  19,  1957,  pp.  92  -  96. 

"Staff  at  Dow  Chemical  Company  has  been  developing  a  Bystem 
for  searching  coded  chemical  compounds  for  desired  structural 
features,  using  a  high- speed  digital  computer  (the  IBM  704). 

A  general  searching  program  has  been  written  to  take  care  of 
90  percent  of  the  searches  requested  by  Dow  chemists;  the 
other  10  percent  can  be  handled  by  writing  special  programs 
or  by  modifying  the  general  program.  10,585  compounds  have 
been  coded  and  recorded  on  magnetic  tape  so  far.  Experience 
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during  the  past  two  years  has  shown  the  feasibility  and 
accuracy  of  machine  searching  and  the  capabilities  and 
limitations  of  the  code.  The  staff  has  developed  the 
multiplexing  principle  for  conducting  simultaneous  searches. 
This  approach  consists  of  taking  a  number  of  searches  in  a 
group  and  comparing  them  with  a  number  of  structures  in  a 
group.  '  With  the  IBM  704,  a  group  of  searches  can  be  com¬ 
pared  with  a  group  of  120  structures  at  a  time.  The  search 
criteria  are  considered  to  be  four  hurdles  or  acceptance 
tests  which  each  compound  under  examination  must  pass  in 
answering  the  search. " 

(AD  Abstract] 


Paden,  B.  R. ,  "Information  Retrieval  on  Automatic  Data  Processing 
Equipment , "  Special  Libraries,  Vol,  50.  Ho.  4,  April  1959,  pp.  162  - 
165. 


Hie  first  in  a  series  of  four  articles  by  a  senior  mathe¬ 
matician  programmer  with  the  IBM  Corporation  explaining  some 
fundamentals  underlying  mechanized  retrieval  methods,  pre¬ 
sented  simply  and  with  examples  to  help  clarify  each  point. 
Discusses  the  breaking  down  of  subject  headings  into  de¬ 
scriptors  suitable  for  coordinate  searching,  emphasizing 
that  'the  development  of  a  set  of  descriptors  adequate  to  a 
particular  application  is  a  major  part  of  the  battle.'" 

[AD  Abstract] 


Paden,  B.  R.,  "Information  Retrieval:  Punched~Card  Equipment," 
Special  Libraries,  Vol,  50.  May- June  1959,  pp.  197  -  200. 

"The  second  in  a  series  of  four  articles  by  an  IBM  senior 
mathematician  programmer,  this  one  being  devoted  to  the 
functions  of  the  IBM  keypunch,  sorter,  accounting  machine 
and  collator,  as  they  relate  specifically  to  retrieval 
methods." 
fAD  Abstract] 
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Paden,  B.  R. ,  ’’Information  Retrieval:  Punched  Card  Techniques  and 
Special  Equipment,"  Special  Libraries,  Vol.  SO,  No.  6,  July-August 
1959,  pp.  244  -  249. 

"The  third  in  a  series  of  four  articles,  this  one  dealing 
with  the  look-up  and  compare  technique  using  a  collator 
and  unit-record  cards  which  have  been  sorted  in  descriptor 
files.  Included  are  brief  explanations  of  coordinate  de¬ 
scriptors,  superimposable  numeric  coding  and  scanning,  the 
universal  card  scanner,  and  the  special  index  analyzer.’’ 

(AD  Abstract] 


Peakes,  Gilbert  L. ,  "Report  Indexing  by  Machine-Sorted  Punched  Cards, 
Punched  Cards,  Their  Applications  to  Science  and  Industry,  edited  by 
Robert  S.  Casey  and  Jaimes  W.  Perry,  New  York,  Reinhold Publishing 
Corporation,  1951,  pp.  115  -  136. 

Although  concerned  primarily  with  coding  techniques  and  sorting 
techniques,  this  paper  does  contain  a  recognition  that  punched  card 
devices  can  be  used  to  search  for  an  intersection  of  headings.  It 
presents  this  notion  by  pointing  out  that  one  of  the  advantages  of 
punched  card  systems  is  that  such  a  system  can  reduce  the  number  of 
cards  which  must  be  employed  in  a  manual  system  when  any  item  is 
indexed  by  more  than  one  heading.  The  system  described  by  Mr.  Peakes 
had  up  to  seven  headings  selected  from  seven  different  categories, 
i.e.,  product,  customer  name,  raw  materials,  processing,  etc.  A 
search  could  be  made  by  any  combination  of  terms  punched  on  a  card. 
Since  Mr.  Peakes  proposed  using  simple  sorting  equipment,  he  recog¬ 
nized  that  such  a  search  would  have  to  be  linear,  that  is,  a  search 
for  a  second  term  within  a  deck  selected  from  the  total  file  by  a 
search  for  the  first  term. 


Perry,  James  W. ,  "Indexing,  Classifying,  and  Coding  the  Chemical 
Literature,"  Industrial  and  Engineering  Chemistry,  40,  May  1940, 
pp.  476  -  477. 

Although  in  this  and  in  many  subsequent  papers  Perry  continued  to 
insist  that  mechanization  must  await  the  solution  of  problems  of 
nomenclature  and  semantics,  this  paper  does  contain  an  excellent 
statement  of  the  manner  in  which  mechanical,  i.e.,  coordinate, 
search  differs  from  standard  indexing  and  classification 
systems : 


U'h  . . ■ 
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"We  now  have  available  mechanical  tools,  both  simple  and 
complex,  which  offer  promise  for  escaping  the  limitations 
previously  imposed  on  indexing  and  classifying  systems. 

For  example,  punched  cards  are  able  to  register  separately 
and  Independently  a  fairly  large  number  of  criteria  which 
may  characterize  any  single  entity  among  a  group  of  its 
fellows’.  Punched  cards,  furthermore,  permit  us  to  use  any 
desired  combination  of  Buch  criteria  in  carrying  out  a 
search  to  isolate  certain  entities  characterized  by  the 
desired  combination  of  criteria.  Owing  tc  the  limited 
number  of  holes  that  may  be  punched  in  a  given  card,  con¬ 
siderable  ingenuity  may  be  required  successfully  to  cope 
with  the  large  number  of  criteria  necessary  to  characterize 
chemical  subject  matter.  Such  mechanical  difficulties  may 
perhaps  be  avoided  by  using  other  mechanical  or  electronic 
devices." 


Perry,  James  W.,  "Information  Analysis  for  Machine  Searching," 
American  Documentation,  Yol.  I,  No.  3,  August  1950,  pp.  133  -  139. 


After  commenting  on  the  difficulties  which  arise  in  the  use  of  con¬ 
ventional  classification  and  indexing  systems,  Perry  proposes  that 
machine  methods  will  make  possible  the  search  by  various  combinations 
of  indexing  terms.  There  seems  to  be  no  realization  that  the  machine 
will  search  by  products,  sums,  or  complements  of  classes  but  it  is 
stated  that  machines  will  make  possible  searches  according  to  the 
following  possibilities: 

(A  or  B)  +  (C  o.r  D) 

(A  or  B)  +  (C  -  D) 

(A  +  B)  or  (C  +  D) 

(A  or  B  or  C)  -  (C  +  D  +  E) 

(A  or  B)  -  (C  or  D) 

There  is,  of  course,  some  analogy  between  this  schema  and  a  Boolean 
schema  but  it  was  not  until  some  time  later  that  Perry  realized  this 
fact.  On  the  other  hand,  the  failure  to  appreciate  the  logic  of 
machine  search,  while  it  led  to  certain  errors  in  symbolism  and 
particular  descriptions  of  the  search  process,  does  not  detract  from 
Perry’s  early  empirical  feel  for  how  the  machines  operated  and  the 
searching  techniques  they  would  make  possible. 
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Peny,  An. 


Corporation,  1951,  pp.  267  -  275. 

In  this  paper,  as  in  many  others  by  Perry,  there  is  a  strong 

upon  the  necessity  for  the  proper  sorts  of  indexing  and  coding  before 

any  great  utility  from  the  machine  can  be  realized.  He  argues  that 


a  punched- card  code  must  be  based  on  well-defined, 
carefully  selected  terminology,  and  must  be  used  in  a 
consistent,  standardized  fashion  when  incorporating  new 
items  into  the  file." 


Certainly  this  is  true.  Perry  does  not  emphasize,  however,  that 
mechanical  selection,  while  requiring  as  much  rigor  in  selection  of 
terms  as  any  other  indexing  system,  does  free  the  indexer  from  the 
necessity  of  decisions  concerning  the  order  of  the  terms  to  be  used 

in  a  search. 


Perry,  James  W.,  "The  Utilization  of  Scientific  Knowledge," 
Scientific  tenthly,  66,  tfey  1948,  pp.  413  -  417. 


In  this  report  of  the  work  of  the  Punched  Card  Committee  of  the 
American  Chemical  Society,  most  of  the  discussion  concerns  abstract¬ 
ing  and  coding  problems.  With  reference  to  coordinate  searching, 

Mr.  Perry  in  this  paper  is  less  sanguine  than  in  his  paper,  Iudex^ 
ing,  Classifying  and  Coding  the  Chemical  Literature  ,  ^  concludes 
that  efficient  mechanical  search  of  large  files  will  not  be  possible 
until  new  coding  systems  are  developed  for  chemical  compounds. 


Rakov,  B.  M.,  and  Cherenin,  V.  P.,  "Machines  for  Retrieving  Infor¬ 
mation  in  the  U.  S.  S.  R. ,  "Unesco  Bulletin  forJClbraries,  Vol. . n, 

No.  8-9,  August-September  1957,  pp.  192  -  197. 


"The  theory  and  construction  of  the  experimental  information- 
retrieving  machine,  EIM,  is  described.  The  machine,  which 
is  based  on  the  C-80-1  analyzing  computer,  was  built  in  1954 
by  the  Institute  of  Scientific  Information  of  the  Soviet 
Academy  of  Sciences.  It  uses  80-column,  12- line  punched 
cards  on  which  information  may  be  entered  by  position,  non¬ 
position,  superimposition,  or  direct  codes.  Multiple  codes 


may  be  combined  on  a  single  card  by  dividing  the  card  into 
code  zones.  The  modified  standard  alphabetical  puncher  used 
can  be  set  to  switch  codes  automatically  as  the  appropriate 
zone  is  reached.  For  retrieval,  questions  are  fed  through  a 
standard  switchboard  or  through  a  special  panel  with  one 
switch  for  each  Bpot  on  the  card.  Cards  are  scanned  once  at 
a  rate  of  7  per  second;  characteristics  of  question  and 
answer  are  compared;  and  the  cards  are  sorted  into  accepted 
and  rejected  slots  on  the  basis  of  logical  exclusion  princi¬ 
ples.  " 

[AD  Abstract] 


Ray,  L.  C. ,  and  Kirsch,  R.  A. ,  "Finding  Chemical  Records  by  Digital 
Computers/’  Science,  126  (3278),  October  25,  1957,  pp.  014  -  819. 

By  treating  chemical  structures  as  spatial  arrangements  of  atoms,  a 
group  at  the  Bureau  of  Standards  developed  a  way  of  searching  for 
chemical  structures  by  numbering  the  position  of  the  atoms,  starting 
from  any  arbitrary  point  in  the  structure.  Presumably  the  system 
provided  for  the  possibility  of  generic  search  for  compounds  having 
any  portion  of  their  structures  in  caramon.  The  method  provides  not 
only  a  search  by  coordination,  but  search  in  terns  of  the  order  of 
terms,  that  is  to  say,  it  uses  more  than  Boolean  functions  of  terms. 
The  system  resembles  that  described  by  H.  P.  Luhn  in  A  Serial 
Notation  for  Describing  the  Topology  of  Multidimensional  Branched 
Structures. " 


Rockwell,  Harriet  E.,  Hayne,  Robert  L.,  and  Garfield,  EUgene,  A 
Unique  System  for  Rapid  Access  to  Large  Volumes  of  Pharmacological 
Data;  Application  to  Published  Literature  on  Chlorpromazine," 
Federation  Proceedings,  Yol.  16,  No.  3,  September  1957,  pp.  726  - 
731. 


"Use  of  IBM  punched  cards  with  multiple  coding  for  inf or 
mation  retrieval  in  the  pharmacological  field." 

[Afi  Abstract] 
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Schultz,  C.  K. ,  and  Shepherd,  C.  A.,  "A  Computer  Analysis  of  the 
Merck  Sharp  and  Dohme  Indexing  System.”  (ONR  Contract  Nonr- 2297(00) 
UP  040-116).  [n.d. ] 

This  report  describes  "an  empirical  study  of  a  satisfactorily 
functioning  punched  card  system. "  Although  a  great  deal  of  de¬ 
scriptive  Information  was  derived,  it  was  concluded  that  the  results 
of  the  study  did  not  permit  any  evaluation  of  the  system  described 
or  of  other  systems. 


Shannon,  Claude  E. ,  and  Weaver,  Warren,  The  Mathematical  Theory  of 
Communication,  Urbana,  The  University  of  Illinois  Press,  1949. . 

This  volume  is  relevant  to  the  development  of  coordinate  indexing  In 
two  respects.  Shannon's  equations  for  the  capacity  of  communication 
channels  are  relevant  to  the  design  of  efficient  Information  systems 
and  coding  for  such  systems.  In  the  second  place,  Shannon's  very 
definition  of  information  as  the  logarithm  of  the  number  of  available 
choices  from  a  set  of  messages,  when  applied  to  information  systems, 
has  the  result  that  the  freer  the  system,  the  more  information  it 
contains.  In  other  words,  a  system  of  coordinate  indexing  without 
roles  and  links  and  without  any  requirement  for  categorizing  or 
ordering  terms  and  in  which  all  the  terms  could  be  coordinated  with 
one  another  would  have  more  Information  in  it  than  any  system  having 
one  or  another  of  the  above  constraints. . 

The  relevance  of  Shannon's  work  to  problems  of  information  storage 
and  retrieval  has  often  been  questioned.  However,  this  relevance 
can  best  be  presented  by  considering  together  the  following  state¬ 
ments  from  the  book: 

"The  semantic  aspects  of  communication  are  irrelevant  to 
the  engineering  aspects." 

"This  does  not  mean  that  the  engineering  aspects  are 
necessarily  irrelevant  to  the  semantic  aspects." 

If  one  properly  understands  these  two  statements,  one  can  also  under¬ 
stand  why  mechanized  systems  and  coding  can  contribute  to  the  se¬ 
mantic  aspects  of  information  storage  and  retrieval  systems  and  why 
semantic  considerations  cannot  contribute  to  the  solution  of  problems 
of  mechanization  (engineering  aspects).  Suppose  one  wished  to  de¬ 
velop  a  high-fidelity  system  for  the  reproduction  or  transmission  of 
music.  Such  a  high-fidelity  system,  properly  engineered,  might 


convey  a  good  violin  tone,  i.e. ,  the  engineering  would  contribute  to 
the  esthetics.  On  the  other  hand,  whether  or  not  violinists  in 
general  played  sweet  or  sour  notes  would  make  no  contribution  to  the 
development  of  high-fidelity  systems,  i.e.,  esthetics  would  not  con¬ 
tribute  to  the  engineering.  We  are  only  interested  in  storage  and 
retrieval  systems  because  individuals  can  index  material,  although 
some  index  poorly.  Whether  the  indexing  is  good  or  bad  does  not  con¬ 
tribute  to  the  engineering  aspects  or  the  mechanization  of  storage 
and  retrieval  systems.  On  the  other  hand,  good  mechanized  systems 
can  convey  the  results  of  good  indexing. 


Shera,  Jesse  H.,  "Classification:  Current  Functions  and  Applications 
to  the  Subject  Analysis  of  Library  Materials,”  The  Subject  Analysis 
of  Library  Materials  (Papers  Presented  at  an  Institute,  June  24-28, 
1952,  under  the  Sponsorship  of  the  School  of  Library  Service, 

Columbia  University,  and  the  A.  L.  A.  Division  of  Cataloging  and 
Classification),  Edited  and  Introduction  by  Maurice  F.  Tauber, 

New  York,  Colombia  University  School  of  Library  Service,  1953,  pp. 

29  -  42. 

After  a  discussion  of  traditional  library  classification  and  its 
limits,  Shera  introduces  symbolic  logic  as  providing  a  new  type  of 
class  order  which  is  non-hierarchical.  Shera  supposes  that  such 
new  types  of  order  and  new  types  of  indexing  are  in  some  sense  de¬ 
rived  from  the  logic.  Actually,  the  logic  serves  only  to  describe 
the  type  of  order  and  the  type  of  indexing.  Shera  also  supposes  that 
the  class  relations  described  in  a  Boolean  algebra  are  a  species  of 
the  genus  classification,  which  also  includes  hierarchical  classifi¬ 
cation.  Most  logicians  have  argued  that  hierarchical  classification 
is  a  special  classification  which  utilizes  exclusively  the  relation 
of  "inclusion”  between  classes. 


Taube,  Mortimer,  "Functional  Approach  to  Bibliographic  Organization: 
A  Critique  and  a  Proposal,"  (Presented  before  the  Fifteenth  Annual 
Conference  of  the  University  of  Chicago  Graduate  Library  School, 
July  24-29,  1950),  Bibliographic  Organization  (edited  by  Jesse  H. 
Shera  and  Margaret  E.  Egan),  Chicago.  University  of  Chicago  Press, 
1951,  57-71. 

A  major  portion  of  this  paper  is  concerned  with  a  critique  of 
traditional  hierarchical  classification  and  alphabetical  subject 
heading  systems.  It  concludes  that  such  systems  can  never  be  the 


basis  of  national  or  International  systems  of  bibliographical  organ¬ 
ization.  As  a  third  possibility,  it  recommends  the  construction  of 
categories  of  terms  and  the  use  of  a  set  of  terms  to  index  any  docu¬ 
ment,  such  a  set  to  be  constructed  by  selecting  one  term  from  each 
category.  Although  there  is  here  an  implicit  recognition  of  the 
intersection  of  terms  to  index  a  document,  the  logic  of  this  inter¬ 
section  was  not  set  forth  and  the  paper  emphasized  primarily  the 
arrangement  of  the  terms  in  categories. 


Taube,  Mortimer,  "Specificity  in  Subject  Headings  and  Coordinate 
Indexing,"  Library  Trends,  Vol.  1,  Ho.  2,  October  1952,  pp.  219  -  223. 

Whereas  in  a  subject  heading  system  specificity  oi  indexing  is  pro¬ 
vided  by  subdivision,  in  a  coordinate  indexing  system  such  specificity 
is  provided  by  intersecting  terms  or  by  determining  product  classes. 

In  principle,  the  use  of  enough  subdivisions  in  a  oifbject  heading  system 
could  provide  the  same  degree  of  specificity  as  a  system  of  coordinate 
indexing.  However,  a  subject  heading  system  which  limits  by  con¬ 
vention  the  number  of  subdivisions  will  also  limit  by  convention  the 
specificity  of  any  individual  heading.  The  Science  and  Technology 
Project  of  the  Library  of  Congress,  by  convention,  used  only  one  sub¬ 
division  of  any  given  heading  and  found  it  necessary  to  use  many 
different  headings  and  subdivisions  in  order  to  express  the  contents 
of  an  item  being  indexed.  The  limitation  on  degree  of  subdivision  in 
an  alphabetical  system  is  usually  imposed  because  of  the  difficulties 
with  alphabetization  and  permutation  created  by  subdivision.  These 
difficulties  disappear  in  a  coordinate  index. 


Taube,  Mortimer  and  Associates,  "Storage  and  Retrieval  of  Information 
by  Means  of  the  Association  of  Ideas,"  American  Documentation,  Vol. 
VI,  No.  1,  January  1955,  pp.  1  -  18. 

What  this  paper  calls  "association  of  ideas  later  came  to  be  part  of 
the  technique  of  thesaurus  building  employed  by  Wall  and  Costello  for 
the  du  Pont  Company  and  H.  P.  Luhn  at  IBM.  Any  search  by  a  Boolean 
function  of  classes  may  lead  to  a  negative  answer  because  the  class 
created  by  an  intersection  of  classes  might  have  no  members.  In 
order  to  provide  for  mechanized  systems  some  indication  of  the  nature 
of  the  material  in  the  system  corresponding  to  the  ability  to  browse 
in  a  manual  index,  It  was  felt  that  a  mechanized  system  should  dis¬ 
play  to  the  searcher  the  class  intersections  which  had  members.  The 
technique  chosen  to  achieve  this  end  was  as  follows:  For  every  term 


245 


In  the  system  there  was  created  a  logical  sum  of  all  other  terns  used 
with  that  term  to  index  any  document.  Each  term,  then,  headed  a  sub¬ 
set  of  terms  in  the  system  ’’associated"  with  it.  The  searcher  could 
input  any  term  and  have  displayed  to  him  this  subset  of  terms  which 
could  guide  him  in  making  subsequent  intersections  for  searching. 

The  technique  chosen  was  limited  to  the  co-occurrence  of  two  terms 
only. 


Taube,  Mortimer  and  Associates,  "Studies  in  Coordinate  Indexing  " 
Washington,  Documentation  Incorporated,  Vol.  I  -  V,  1953-1959 


Taube,  Mortimer,  Gull,  C.  D.,-  and  Wachtel,  Irma  3.,  "Uhit  Terms  in 
Coordinate  Indexing,"  American  Documentation,  Vol.  Ill,  Ho.  4, 

October  1952,  pp.  213  -  218. 

This  paper  was  one  of  the  first  public  announcements  of  the  Uniterm 
system.  Besides  describing  the  type  of  card  and  the  method  of  posting, 
it  discussed  the  creation  of  vocabularies  by  breaking  up  standard 
subject  headings  and  classification  systems  into  sets  of  terms.  It 
also  discussed  the  problem  of  multiple-word  Indexing  terms  and  pre¬ 
sented  a  suggested  rule  for  combining  or  separating  words  in  an 
indexing  term.  The  rule  has  become  known  In  the  literature  as  the 
rule  for  "free"  and  "bound"  terms.  Finally,  the  paper  presented  a 
set  of  rules  for  organizing  a  Uniterm  system. 


Taube,  Mortimer,  "The  Coordinate  Indexing  of  Scientific  Fields," 

(Bead  before  the  Symposium  on  Mechanical  Aids  to  Chemical  Docu¬ 
mentation  of  the  Division  of  Chemical  Literature,  September  4,  1951), 
unpublished. 

The  substance  of  this  paper  has  been  presented  in  the  text. 


Taube,  Mortimer  and  Wooster,  Harold,  "Information  Storage  and  Re¬ 
trieval,  Theory,  Systems,  and  Devices,"  (Air  Force  Office  of 
Scientific  Research  Symposium),  New  York,  Columbia  University  Press, 
1958.  (Number  Ten,  Columbia  University  Studies  in  Library  Service). 


246 


Thorne,  R,  0. ,  "The  Efficiency  of  Subject  Catalogues,  and  the  Coat 
of  Information  Searches,"  Faraborough,  England,  Royal  Aircraft 
Establishment,  April  1955,  21  p. 

"An  expression  for  the  efficiency  of  a  subject  catalogue  or 
index  is  derived  from  the  probability  of  success  when  using 
the  catalogue,  and  the  cost  of  making  and  using  the  cata¬ 
logue,  compared  with  the  cost  of  finding  material  in  the 
library  stock  when  no  subject  catalogue  is  available. 

"The  method,  developed  primarily  for  assessing  the  efficiency 
of  subject  catalogues  or  indexes,  can  be  applied  also  to 
author  catalogues  and  other  bibliographic  aids. 

"Numerical  examples  illustrate  application  of  the  method  to 
data  from  tests  of  the  Catalogue  of  Aerodynamic  Data  de¬ 
veloped  by  the  National  Aeronautical  Research  Institute 
Amsterdam,  the  Uniterm  System  of  Coordinate  Indexing,  and 
the  Universal  Decimal  Classification  Catalogue  of  the 
R.  A.  E.  Library." 

[Author’s  Summary] 


Tyler,  A.  W.,  flyers,  W.  L. ,  and  Kuipers,  J.  W.,  The  Application  of 
the  Kodak  Minicard  System  to  Problems  of  Documentation,"  American 
Documentation.  Vol.  VI,  No.  1,  January  1955,  pp.  10  -  26. 

Although  essentially  concerned  with  the  description  of  a  device,^ 
this  paper  is  an  example  of  the  movement  away  from  pure  formB  of 
coordinate  indexing.  The  Minicard  Selector  was  designed  to  have 
the  capacity  to  select  by  several  different  groups  of  terms.  With¬ 
in  each  group  the  terms  were  related  by  the  regular  Boolean  functions 
but  the  Selector  could  operate  on  several  groups  at  the  same  time. 

The  relation  between  the  groups  was  presumably  of  a  different  type 
from  the  relation  between  the  terms  in  any  one  group.  This  provided 
the  possibility  of  "multi-level"  searching. 

Through  its  use  of  high-reduction  unitized  film,  the  Minicard  System 
also  departed  from  simpler  types  of  coordinate  indexing  by  posting 
the  total  text,  rather  than  a  number  designating  the  text,  under 
each  term. 
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Research  and  Development  Reports,  No.  1  -  20,  1956-1961.  U.  3.  Patent 
Office.  Washington,  D.  C. 

This  series  of  reports,  prepared  under  the  direction  of  Mr.  Don 
Andrews,  Chief  of  the  Office  of  Research  and  Development,  Itaited 
States  Patent  Office,  and  by  the  Bureau  of  Standards  group  working 
with  the  Patent  Office  in  accordance  with  the  recommendations  of 
the  Bush  Committee,  describes  various  attempts  to  code  chemical 
compounds  and  to  utilized  punched  card  equipment  and  the  Bureau  of 
Standards  CEAC  for  mechanizing  searching  of  patents.  In  addition, 
the  number  of  papers  by  Newman  attempt  the  creation  of  a  new  language, 
which  he  calls  "Ruly  English. "  In  this  language,  every  idea  would 
have  one  word  and  every  word  would  express  only  a  single  idea.  To 
the  extent  that  the  creation  of  such  a  language  is  a  sine  SE  non  of 
mechanizing  the  storage  and  retrieval  of  information,  it  can  be 
assumed  that  this  goal  Is  imp'ossible.  The  ad  hoc  experiment  carried 
out  by  the  Patent  Office  and  Bureau  of  Standards  groups  has  not  led 
to  any  real  advance  in  the  art  or  pointed  out  the  direction  in  which 
the  mechanization  of  Patent  Office  search  is  to  be  achieved.  The 
Kelly  Report  on  The  Role  of  the  Department  of  Commerce  in  Science 
and  Technology  sums  up  the  work  described  in  these  reports  as 
follows: 

"On  the  other  hand,  there  is  a  possibility  that  here  research 
has  proceeded  in  the  wrong  order,  in  lixat  components  of  the 
program  have  been  designed  before  sufficient  thought  was 
given  to  the  operation  of  the  over-all  system.  Thus  there 
may  be  here  a  warning  that  when  research  is  bound  too  closely 
to  production,  there  may  be  a  pressure  to  show  that  some¬ 
thing  specific  is  being  worked  on;  since  initially  the  only 
specific  things  are  components,  there  may  result  a  mis¬ 
direction  of  effort." 


Vickery,  B.  C.,  Classification  and  Indexing  in  Science,  London, 
Butterworth’s  Scientific  Publications,  1958. 

A  defense  of  the  traditional  British  view  that  classification  is 
prior  to  indexing,  in  this  case,  that  classification  is  prior  to 
coordinate  indexing. 
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Vickery,  B.  C. ,  "Tire  Function  of  Classification  in  Information  Re¬ 
trieval,"  ASUB  Aeronautical  Group,  Fourth  Annual  Conference, 
Cranfleld,  April  1954. 

This  paper  is  one  of  the  earliest  expressions  of  a  complex  of  views 
which  has  characterized  British  work  in  the  I.  R.  field.  This  view 
is  reflected  in  the  Cranfield  Project  and  has  received  fuller  ex¬ 
pression  in  Vickery's  volume,  Classification  and  Indexing  in  Science. 

Vickery,  following  Robert  Thorne  of  RAE,  set  himself  the  task  of 
comparing  different  indexing  systems.  This  task  is,  of  course,  the 
basis  of  the  research  project  which  Cleverdon  has  carried  out  under 
National  Science  Foundation  auspices  at  Cranfield.  Vickery  dis¬ 
tinguishes  four  types  of  retrieval  systems:  the  alphabetical  subject 
index,  the  coordinate  indexing  system,  the  classified  index,  and 
automatic  selection.  Automatic  selection  is  obviously  not  a  system 
of  indexing  like  the  other  three,  but  a  method  which  can  be  used  with 
the  other  three.  The  paper  concludes  that  whereas  a  coordinate  index¬ 
ing  system  may  be  satisfactory  for  the  actual  indexing  operation, 
such  a  system  must  be  supplemented  with  a  classification  of  terms. 

The  classification  of  terms  provides  clueB  to  the  use  of  the  indexing 
system  for  anyone  unfamiliar  with  its  vocabulary.  What  Vickery  calls 
the  classification  of  terms  1ms  been  referred  to  in  other  papers  as 
categorization  of  terms,  although  the  English  have  attempted  tq  set 
up  very  rigorous  schedules  of  terms,  following  both  the  UDC  and  the 
type  of  "faceted"  analysis  developed  by  Eanganathan.  Vickery  does 
not  provide  any  evidence  for  the  basic  question  he  raises,  namely, 
whether  categorization  of  terms  is  possible  for  general  systems. 
Although  many  people  have  suggested  the  development  of  such  systems, 
no  one  has  actually  produced  a  general  categorization  or  classifi¬ 
cation  of  a  total  vocabulary.  A  distinction  is  being  made  here 
between  such  a  total  vocabulary  and  a  categorization  of  terms  in  a 
limited  area,  e.g.,  instrumentation. 


Vickery,  B.  C.,  "Problems  in  the  Construction  of  Information  Re¬ 
trieval  Systems,"  Journal  of  Documentation,  Vol.  14,  No.  3,  September 
1958,  pp.  136  -  143. 

"Speaking  for  the  Classification  Research  Group,  the  author 
lists  unsolved  problems  that  ought  to  be  tackled  systemati¬ 
cally.  He  states  the  basis  of  an  Information  retrieval 
system  to  be  a  lattice  of  terms  with  potentially  unlimited 
interconnections,  from  among  which  every  system  must  select 
certain  interconnections  for  display.  Hie  selection  is 
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based  on  postulates  concerning  the  semantic  level  of 
terns,  the  categories  of  terms  to  he  used,  the  generic, 
coordinate,  and  conjunctive  relations  to  he  displayed, 
and  the  types  of  search  operation  to  he  conducted.  The 
first  problem  is  to  explore  the  variety  of  postulates 
that  can  be  made,  and  to  assess  their  situations.  Other 
problems  are  detailed  in  similar  manner,  followed  by 
discussions  and  extracts  of  the  CRG  meetings." 

[AD  Abstract] 

Vickery,  together  with  the  CRG,  which  he  dominates,  has  been  one 
of  the  strongest  forces  opposed  to  free  or  even  relatively  free 
I.  R.  systems.  They  have  insisted  on  the  fundamental  importance 
of  tightly  structured  hierarchical  systems. 

Vickery,  B.  C, ,  "Some  Comments  on  Mechanical  Selection, "  American 
Documentation,  Vol.  II,  No.  2,  April  1951,  pp.  102  -  107. 

This  paper  is  one  of  the  earliest  criticisms  of  coordinate  indexing 
based  upon  the  presumed  superiority  of  a  classification  system  to 
suggest  the  terms  to  be  used  in  both  indexing  and  searching.  Vickery 
recognizes  that  specific  subjects  can  be  both  indexed  and  retrieved 
by  combinations  of  terms  but  he  feels  that  both  the  indexer  and  the 
searcher  need  more  than  an  alphabetical  list  of  such  terms  in  order 
to  use  a  system.  He  suggests  that  any  system  of  mechanical  selection 
be  supplemented  by  an  alphabetical  index  to  a  systematic  classification 
from  which,  in  turn,  the  terms  used  in  indexing  could  be  derived. 


Wachtel,  Irma,  "A  Punched  Cord  Index  for  Nuclear  Data, "  American 
Documentation,  Vol.  Ill,  No.  1,  January  1952,  pp.  56  -  57. 

This  paper  describes  one,  of  the  earliest  applications  of  the  Batten 
or  optical  coincidence  principle  to  the  indexing  of  a  special  field 
of  information.  The  index  was  designed  to  enable  physicists  1  to 
determine  quickly  and  easily  which  nuclides  possess  specified  combi¬ 
nations  of  properties."  A  card  is  set  up  for  each  property  and  each 
punching  position  on  the  card  represents  a  particular  nuclide.  Bach 
nuclide  has  the  same  position  on  every  card.  The  search  for  any 
nuclide  or  nuclides  having  a  specific  combination  of  properties  is 
made  by  superimposing  the  selected  property  cards  on  one  another  and 
noting  those  areas  which  have  holes  on  all  the  superimposed  cards. 
This  method  of  searching  delivers  only  product  classes.  It  is  a 
characteristic  of  optical  matrix  systems  that  they  are  very  efficient 
for  product  searches  and  cannot  readily  be  used  tor  sum  or  complement 
searches. 
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Well,  Eugene,  "A  Practical  System  for  Documentation, "  Library  Journal, 
Vol,  85,  No.  5,  March  1,  1960,  pp.  883  -  897. 

"This  paper  is  concerned  with  a  review  of  the  fundamentals 
and  principles  of  building  an  infoimtion  system.  Dis¬ 
cusses  problems  of  viewpoint,  generics,  semantics,  and 
syntactics  and  their  solution  through  prescription  of 
vocabulary  or  through  redundancy  in  storage  or  in- re-  - 
trieval.  The  technical  thesaurus  is  considered  as  a 
means  of  solving  semantic  and  generic  problems.  Unit 
terms  and  resulting  syntactical  problems,  role  indi¬ 
cators,  arrangement  of  units  in  the  system,  and  abstracts 
are  other  aspects  discussed. " 

[AD  Abstract] 


Warheit,  I.  A.,  "Evaluation  of  Library  Techniques  for  the  Control  of 
Research  Materials,"  American  Documentation,  Vol.  VII,  No.  4, 

October  1956,  pp.  267  -  275. 

This  paper  is  an  examination  of  proposed  mechanized  systems,  in¬ 
cluding  especially  the  Uniterm  System  of  coordinate  indexing,  as 
described  in  the  early  papers.  It  is  quite  critical  of  the  claims 
made  for  coordinate  indexing. 


Weinstein,  Shirley  J.,  and  Drozda,  Raymond  J.,  "Adaptation  of 
Coordinate  Indexing  System  to  a  General  Literature  and  Patent  File: 
Machine  Posting,"  American  Documentation,  Vol.  10,  No.  2,  April 
1959,  pp.  122  -  129. 

"Describes  a  procedure  for  using  IBM  punched-card  equipment 
to  tabulate  document  numbers  for  a  Uniterm  index." 

[4&  Abstract] 


Wildhack,  W.  A.,  Stern,  Joshua,  and  Smith,  Julian,  "Documentation  in 
Instrumentation,"  American  Documentation,  Vol.  V,  No.  4,  October 
1954,  pp.  223  -  237. 

This  paper  contains  both  a  description  of  a  special  peek-a-boo  de¬ 
vice  .constructed  by  the  Office  of  Basic  Instrumentation  of  the 
Bureau  of  Standards  and  an  indexing  system  to  be  used  with  the 
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device.  The  indexing  system  is  characterized  as  "The  OBI  Jfalti- 
Aepect  System. "  It  is  a  system  of  coordinate  indexing  which  elirai* 
nates  connectives  and  word  order  and  also  variant  grammatical  forms 
of  the  Indexing  terms.  However,  because  the  system  covers  only  a 
very  special  field,  namely,  instrumentation,  the  designers  of  the 
system  found  it  possible  to  set  up  the  Indexing  terms  in  a  set  of 
categories,  e.g.,  Physical  Property,  Principle  of  Measurement, 

Name  of  Instrument,  Field  of  Application,  etc. 


Wise,  Carl  3.  and  Perry,  James  W. ,  "Multiple  Coding  and  the  Bapid 
Selector,"  American  Documentation,  Vol.  I,  No.  2,  April  1950,  pp. 

76  -  83. 

Like  many  of  the  early  papers  on  coordinate  indexing,  this  paper 
emphasizes  coding  problems,  largely  because  the  devices  suggested, 
namely,  edge-notched  cards  and  even  IBM  cards,  had  limited  coding 
areas.  This  paper  suggests  the  use  of  an  alphabetical  code  and  in¬ 
dicates  that  such  a  code  developed  for  keysort  cards  could  also  be 
applied  to  the  proposed  Rapid  Selector.  The  major  contribution  of 
the  paper  to  coordinate  indexing  is  its  recognition  that • 

"In  constructing  an  index,  it  is  not  practical  to  provide 
separate  entries  for  every  combination  of  entities,  con¬ 
cepts  and  operations  mentioned  in  the  material  being  in¬ 
dexed.  If  this  were  attempted,  the  index  would  be  too 
bulky.  Nor  is  it  practical  to  establish  as  separate 
classes  and  sub-classes  every  possible  permutation  of 
all  basic  criteria  used  in  classification.  If  this  were 
attempted,  the  resulting  complexity  of  the  system  would 
defeat  its  own  purpose." 

However,  the  bulk  of  the  paper  Is  concerned  with  superimposition  of 
word  codes  and  the  appearance  of  the  paper  resulted  in  a  reply  from 
Mr.  Calvin  Mooers. 


Wise,  Carl  S. ,  "A  Punched  Card  File  Based  on  Word  Coding, "  Punched 
Cards,  Their  Applications  to  Science  and  Industry,  edited  by  Robert 
S.  Casey  and  James  W.  Perry,  New  York,  Relnhold  Publishing  Corpo¬ 
ration,  1951,  pp.  93  -  113. 

Only  the  introduction  of  this  paper  points  out  that  searches  by 
mechanical  sorting  methods  "may  be  directed  to  combinations  of  words 


252 


and  phrases. "  The  balance  of  the  paper  is  concerned  with  the  use 
of  letter  codes  having  mnemonic  qualities  as  opposed  to  numerical 
codesj  and  a  consideration  of  the  mathematics  of  coding. 


Wise,  Carl  S'.,  '’Multiple  Word  Coding  vs  Random  Coding  for  the  Rapid 
Selector.  A  Reply  to  Calvin  N.  Mooers,"  American  Documentation, 

Vol.  Ill,  No.  4,  October  1952,  pp.  223  -  225. 

In  this  reply  to  Calvin  Mooers,  Wise  defends  the  use  of  letter  coding 
as  opposed  to  the  random  number  coding  recommended  by  Mooers.  Wise 
admits  that  word  coding  may  lack  the  efficiency  of  random  number 
coding  but  he  thinks  this  lack  of  coding  efficiency  is  more  than 
made  up  by  the  mnemonic  character  of  word  coding  which  might  elimi¬ 
nate  the  necessity  of  dictionary  look-up  at  both  input  and  output. 
Wise  argues  further  that  letters  can  be  randomized  Just  as  numbers 
can  be  randomized  and  that  if  one  were  willing  to  give  up  the 
mnemonic  characteristic  of  word  coding,  one  could  use  letters  Just 
as  efficiently  in  coding  as  one  uses  numbers. 
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